Overview

Dataset statistics

Number of variables46
Number of observations407684
Missing cells3871070
Missing cells (%)20.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory143.1 MiB
Average record size in memory368.0 B

Variable types

Numeric11
Categorical34
Boolean1

Alerts

ori has constant value "CA0371100" Constant
agency has constant value "SD" Constant
gendnc_code has constant value "5.0" Constant
id has a high cardinality: 407684 distinct values High cardinality
date has a high cardinality: 912 distinct values High cardinality
time has a high cardinality: 77771 distinct values High cardinality
inters has a high cardinality: 15939 distinct values High cardinality
street has a high cardinality: 44668 distinct values High cardinality
hw_exit has a high cardinality: 2211 distinct values High cardinality
school_name has a high cardinality: 85 distinct values High cardinality
beat_name has a high cardinality: 127 distinct values High cardinality
disability has a high cardinality: 134 distinct values High cardinality
reason_text has a high cardinality: 1697 distinct values High cardinality
reason_detail has a high cardinality: 282 distinct values High cardinality
reason_exp has a high cardinality: 183583 distinct values High cardinality
search_basis has a high cardinality: 721 distinct values High cardinality
search_basis_exp has a high cardinality: 28990 distinct values High cardinality
prop_type has a high cardinality: 490 distinct values High cardinality
cont has a high cardinality: 669 distinct values High cardinality
actions has a high cardinality: 11672 distinct values High cardinality
act_consent has a high cardinality: 335 distinct values High cardinality
df_index is highly correlated with Unnamed: 0High correlation
Unnamed: 0 is highly correlated with df_indexHigh correlation
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
df_index is highly correlated with Unnamed: 0High correlation
Unnamed: 0 is highly correlated with df_indexHigh correlation
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
df_index is highly correlated with Unnamed: 0High correlation
Unnamed: 0 is highly correlated with df_indexHigh correlation
is_school is highly correlated with is_studentHigh correlation
is_student is highly correlated with is_schoolHigh correlation
df_index is highly correlated with Unnamed: 0 and 3 other fieldsHigh correlation
Unnamed: 0 is highly correlated with df_index and 3 other fieldsHigh correlation
stop_id is highly correlated with df_index and 3 other fieldsHigh correlation
exp_years is highly correlated with ldmk and 1 other fieldsHigh correlation
dur is highly correlated with school_name and 1 other fieldsHigh correlation
is_serv is highly correlated with ldmk and 1 other fieldsHigh correlation
assign_key is highly correlated with assign_words and 2 other fieldsHigh correlation
assign_words is highly correlated with assign_key and 2 other fieldsHigh correlation
ldmk is highly correlated with df_index and 15 other fieldsHigh correlation
is_school is highly correlated with is_studentHigh correlation
school_name is highly correlated with df_index and 15 other fieldsHigh correlation
city is highly correlated with ldmk and 1 other fieldsHigh correlation
beat is highly correlated with ldmk and 1 other fieldsHigh correlation
is_student is highly correlated with is_school and 2 other fieldsHigh correlation
lim_eng is highly correlated with ldmkHigh correlation
age is highly correlated with ldmk and 1 other fieldsHigh correlation
gender_words is highly correlated with ldmk and 1 other fieldsHigh correlation
is_gendnc is highly correlated with gender_codeHigh correlation
gender_code is highly correlated with ldmk and 2 other fieldsHigh correlation
lgbt is highly correlated with ldmkHigh correlation
race is highly correlated with ldmk and 1 other fieldsHigh correlation
reason_words is highly correlated with school_name and 3 other fieldsHigh correlation
reasonid is highly correlated with ldmk and 2 other fieldsHigh correlation
seiz_basis is highly correlated with dur and 2 other fieldsHigh correlation
inters has 366868 (90.0%) missing values Missing
block has 43330 (10.6%) missing values Missing
ldmk has 407643 (> 99.9%) missing values Missing
street has 16834 (4.1%) missing values Missing
hw_exit has 404618 (99.2%) missing values Missing
school_name has 407362 (99.9%) missing values Missing
gendnc_code has 407507 (> 99.9%) missing values Missing
reasonid has 18844 (4.6%) missing values Missing
reason_text has 18844 (4.6%) missing values Missing
reason_detail has 18838 (4.6%) missing values Missing
search_basis has 321160 (78.8%) missing values Missing
search_basis_exp has 344258 (84.4%) missing values Missing
seiz_basis has 398568 (97.8%) missing values Missing
prop_type has 398568 (97.8%) missing values Missing
act_consent has 297641 (73.0%) missing values Missing
block is highly skewed (γ1 = 254.9436976) Skewed
id is uniformly distributed Uniform
ldmk is uniformly distributed Uniform
id has unique values Unique

Reproduction

Analysis started2023-03-29 06:31:19.748200
Analysis finished2023-03-29 06:33:33.358212
Duration2 minutes and 13.61 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct187251
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76802.8756
Minimum0
Maximum187250
Zeros3
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:33.590211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6794.15
Q133973
median67947
Q3117974
95-th percentile166865.85
Maximum187250
Range187250
Interquartile range (IQR)84001

Descriptive statistics

Standard deviation50412.73086
Coefficient of variation (CV)0.6563911893
Kurtosis-0.9762821719
Mean76802.8756
Median Absolute Deviation (MAD)40395
Skewness0.354725747
Sum3.131130354 × 1010
Variance2541443433
MonotonicityNot monotonic
2023-03-29T02:33:33.742224image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03
 
< 0.1%
646683
 
< 0.1%
523783
 
< 0.1%
544253
 
< 0.1%
564723
 
< 0.1%
421353
 
< 0.1%
441823
 
< 0.1%
462293
 
< 0.1%
482763
 
< 0.1%
339393
 
< 0.1%
Other values (187241)407654
> 99.9%
ValueCountFrequency (%)
03
< 0.1%
13
< 0.1%
23
< 0.1%
33
< 0.1%
43
< 0.1%
53
< 0.1%
63
< 0.1%
73
< 0.1%
83
< 0.1%
93
< 0.1%
ValueCountFrequency (%)
1872501
< 0.1%
1872491
< 0.1%
1872481
< 0.1%
1872471
< 0.1%
1872461
< 0.1%
1872451
< 0.1%
1872441
< 0.1%
1872431
< 0.1%
1872421
< 0.1%
1872411
< 0.1%

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct187251
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76803.8756
Minimum1
Maximum187251
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:33.894243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6795.15
Q133974
median67948
Q3117975
95-th percentile166866.85
Maximum187251
Range187250
Interquartile range (IQR)84001

Descriptive statistics

Standard deviation50412.73086
Coefficient of variation (CV)0.656382643
Kurtosis-0.9762821719
Mean76803.8756
Median Absolute Deviation (MAD)40395
Skewness0.354725747
Sum3.131171122 × 1010
Variance2541443433
MonotonicityNot monotonic
2023-03-29T02:33:34.022210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20493
 
< 0.1%
626213
 
< 0.1%
503313
 
< 0.1%
523783
 
< 0.1%
544253
 
< 0.1%
564723
 
< 0.1%
421353
 
< 0.1%
441823
 
< 0.1%
462293
 
< 0.1%
482763
 
< 0.1%
Other values (187241)407654
> 99.9%
ValueCountFrequency (%)
13
< 0.1%
23
< 0.1%
33
< 0.1%
43
< 0.1%
53
< 0.1%
63
< 0.1%
73
< 0.1%
83
< 0.1%
93
< 0.1%
103
< 0.1%
ValueCountFrequency (%)
1872511
< 0.1%
1872501
< 0.1%
1872491
< 0.1%
1872481
< 0.1%
1872471
< 0.1%
1872461
< 0.1%
1872451
< 0.1%
1872441
< 0.1%
1872431
< 0.1%
1872421
< 0.1%

stop_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct353547
Distinct (%)86.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean269011.3497
Minimum84362
Maximum449933
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:34.190248image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum84362
5-th percentile106360.15
Q1177796.75
median269751.5
Q3359576.25
95-th percentile431524.85
Maximum449933
Range365571
Interquartile range (IQR)181779.5

Descriptive statistics

Standard deviation104491.5027
Coefficient of variation (CV)0.3884278593
Kurtosis-1.196648741
Mean269011.3497
Median Absolute Deviation (MAD)90905
Skewness-0.006532642956
Sum1.096716231 × 1011
Variance1.091847413 × 1010
MonotonicityNot monotonic
2023-03-29T02:33:34.342224image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17401152
 
< 0.1%
18408548
 
< 0.1%
18032646
 
< 0.1%
16993242
 
< 0.1%
18365540
 
< 0.1%
16109539
 
< 0.1%
17447238
 
< 0.1%
23696535
 
< 0.1%
17031634
 
< 0.1%
18364632
 
< 0.1%
Other values (353537)407278
99.9%
ValueCountFrequency (%)
843621
< 0.1%
843641
< 0.1%
843651
< 0.1%
843661
< 0.1%
843691
< 0.1%
843701
< 0.1%
843711
< 0.1%
843722
< 0.1%
843731
< 0.1%
843741
< 0.1%
ValueCountFrequency (%)
4499331
 
< 0.1%
4497261
 
< 0.1%
4497161
 
< 0.1%
4497091
 
< 0.1%
4497011
 
< 0.1%
4496941
 
< 0.1%
4496932
< 0.1%
4496921
 
< 0.1%
4496873
< 0.1%
4496751
 
< 0.1%

pid
Real number (ℝ≥0)

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.26214421
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:34.518210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum52
Range51
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.224532182
Coefficient of variation (CV)0.9701998971
Kurtosis338.2990747
Mean1.26214421
Median Absolute Deviation (MAD)0
Skewness14.70829698
Sum514556
Variance1.499479066
MonotonicityNot monotonic
2023-03-29T02:33:34.694210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1353540
86.7%
235536
 
8.7%
39722
 
2.4%
43793
 
0.9%
51715
 
0.4%
6889
 
0.2%
7531
 
0.1%
8360
 
0.1%
9257
 
0.1%
10210
 
0.1%
Other values (42)1131
 
0.3%
ValueCountFrequency (%)
1353540
86.7%
235536
 
8.7%
39722
 
2.4%
43793
 
0.9%
51715
 
0.4%
6889
 
0.2%
7531
 
0.1%
8360
 
0.1%
9257
 
0.1%
10210
 
0.1%
ValueCountFrequency (%)
521
 
< 0.1%
511
 
< 0.1%
501
 
< 0.1%
491
 
< 0.1%
482
< 0.1%
472
< 0.1%
463
< 0.1%
453
< 0.1%
443
< 0.1%
433
< 0.1%

id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct407684
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
397553_1
 
1
325762_1
 
1
194574_1
 
1
305336_1
 
1
440720_1
 
1
Other values (407679)
407679 

Length

Max length9
Median length8
Mean length7.965208348
Min length7

Characters and Unicode

Total characters3247288
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique407684 ?
Unique (%)100.0%

Sample

1st row84362_1
2nd row84364_1
3rd row84365_1
4th row84366_1
5th row84369_1

Common Values

ValueCountFrequency (%)
397553_11
 
< 0.1%
325762_11
 
< 0.1%
194574_11
 
< 0.1%
305336_11
 
< 0.1%
440720_11
 
< 0.1%
447632_11
 
< 0.1%
170883_11
 
< 0.1%
132869_11
 
< 0.1%
133994_11
 
< 0.1%
114912_11
 
< 0.1%
Other values (407674)407674
> 99.9%

Length

2023-03-29T02:33:34.870242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
397553_11
 
< 0.1%
374126_11
 
< 0.1%
155359_21
 
< 0.1%
429671_11
 
< 0.1%
114522_11
 
< 0.1%
345194_11
 
< 0.1%
292295_11
 
< 0.1%
245032_21
 
< 0.1%
208957_11
 
< 0.1%
344514_21
 
< 0.1%
Other values (407674)407674
> 99.9%

Most occurring characters

ValueCountFrequency (%)
1674216
20.8%
_407684
12.6%
2353904
10.9%
3332496
10.2%
4269023
 
8.3%
9209563
 
6.5%
0205642
 
6.3%
8203747
 
6.3%
6198566
 
6.1%
5196463
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2839604
87.4%
Connector Punctuation407684
 
12.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1674216
23.7%
2353904
12.5%
3332496
11.7%
4269023
 
9.5%
9209563
 
7.4%
0205642
 
7.2%
8203747
 
7.2%
6198566
 
7.0%
5196463
 
6.9%
7195984
 
6.9%
Connector Punctuation
ValueCountFrequency (%)
_407684
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3247288
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1674216
20.8%
_407684
12.6%
2353904
10.9%
3332496
10.2%
4269023
 
8.3%
9209563
 
6.5%
0205642
 
6.3%
8203747
 
6.3%
6198566
 
6.1%
5196463
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3247288
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1674216
20.8%
_407684
12.6%
2353904
10.9%
3332496
10.2%
4269023
 
8.3%
9209563
 
6.5%
0205642
 
6.3%
8203747
 
6.3%
6198566
 
6.1%
5196463
 
6.1%

ori
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
CA0371100
407684 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters3669156
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCA0371100
2nd rowCA0371100
3rd rowCA0371100
4th rowCA0371100
5th rowCA0371100

Common Values

ValueCountFrequency (%)
CA0371100407684
100.0%

Length

2023-03-29T02:33:34.998242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:35.166243image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
ca0371100407684
100.0%

Most occurring characters

ValueCountFrequency (%)
01223052
33.3%
1815368
22.2%
C407684
 
11.1%
A407684
 
11.1%
3407684
 
11.1%
7407684
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2853788
77.8%
Uppercase Letter815368
 
22.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01223052
42.9%
1815368
28.6%
3407684
 
14.3%
7407684
 
14.3%
Uppercase Letter
ValueCountFrequency (%)
C407684
50.0%
A407684
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common2853788
77.8%
Latin815368
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
01223052
42.9%
1815368
28.6%
3407684
 
14.3%
7407684
 
14.3%
Latin
ValueCountFrequency (%)
C407684
50.0%
A407684
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3669156
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01223052
33.3%
1815368
22.2%
C407684
 
11.1%
A407684
 
11.1%
3407684
 
11.1%
7407684
 
11.1%

agency
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
SD
407684 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters815368
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSD
2nd rowSD
3rd rowSD
4th rowSD
5th rowSD

Common Values

ValueCountFrequency (%)
SD407684
100.0%

Length

2023-03-29T02:33:35.286210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:35.414242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
sd407684
100.0%

Most occurring characters

ValueCountFrequency (%)
S407684
50.0%
D407684
50.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter815368
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S407684
50.0%
D407684
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin815368
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S407684
50.0%
D407684
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII815368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S407684
50.0%
D407684
50.0%

exp_years
Real number (ℝ≥0)

HIGH CORRELATION

Distinct40
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.275789582
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:35.518242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median3
Q310
95-th percentile21
Maximum50
Range49
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.089598806
Coefficient of variation (CV)1.129674396
Kurtosis2.149779906
Mean6.275789582
Median Absolute Deviation (MAD)2
Skewness1.588415896
Sum2558539
Variance50.26241123
MonotonicityNot monotonic
2023-03-29T02:33:35.670242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
1152249
37.3%
333347
 
8.2%
230587
 
7.5%
530179
 
7.4%
424049
 
5.9%
1016650
 
4.1%
1112187
 
3.0%
1811768
 
2.9%
910901
 
2.7%
129835
 
2.4%
Other values (30)75932
18.6%
ValueCountFrequency (%)
1152249
37.3%
230587
 
7.5%
333347
 
8.2%
424049
 
5.9%
530179
 
7.4%
69370
 
2.3%
74610
 
1.1%
85255
 
1.3%
910901
 
2.7%
1016650
 
4.1%
ValueCountFrequency (%)
504
 
< 0.1%
4923
 
< 0.1%
48231
0.1%
4533
 
< 0.1%
372
 
< 0.1%
351
 
< 0.1%
341
 
< 0.1%
3335
 
< 0.1%
32197
< 0.1%
3188
 
< 0.1%

date
Categorical

HIGH CARDINALITY

Distinct912
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
2020-02-12
 
799
2019-05-23
 
793
2020-02-11
 
791
2019-07-06
 
755
2020-01-16
 
749
Other values (907)
403797 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters4076840
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019-01-01
2nd row2019-01-01
3rd row2019-01-01
4th row2019-01-01
5th row2019-01-01

Common Values

ValueCountFrequency (%)
2020-02-12799
 
0.2%
2019-05-23793
 
0.2%
2020-02-11791
 
0.2%
2019-07-06755
 
0.2%
2020-01-16749
 
0.2%
2019-10-23734
 
0.2%
2019-09-24733
 
0.2%
2019-08-21722
 
0.2%
2019-10-02715
 
0.2%
2019-03-27712
 
0.2%
Other values (902)400181
98.2%

Length

2023-03-29T02:33:35.806211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2020-02-12799
 
0.2%
2019-05-23793
 
0.2%
2020-02-11791
 
0.2%
2019-07-06755
 
0.2%
2020-01-16749
 
0.2%
2019-10-23734
 
0.2%
2019-09-24733
 
0.2%
2019-08-21722
 
0.2%
2019-10-02715
 
0.2%
2019-03-27712
 
0.2%
Other values (902)400181
98.2%

Most occurring characters

ValueCountFrequency (%)
01073720
26.3%
2865923
21.2%
-815368
20.0%
1591994
14.5%
9254877
 
6.3%
3102764
 
2.5%
580571
 
2.0%
479204
 
1.9%
674134
 
1.8%
769263
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3261472
80.0%
Dash Punctuation815368
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01073720
32.9%
2865923
26.6%
1591994
18.2%
9254877
 
7.8%
3102764
 
3.2%
580571
 
2.5%
479204
 
2.4%
674134
 
2.3%
769263
 
2.1%
869022
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-815368
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4076840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01073720
26.3%
2865923
21.2%
-815368
20.0%
1591994
14.5%
9254877
 
6.3%
3102764
 
2.5%
580571
 
2.0%
479204
 
1.9%
674134
 
1.8%
769263
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4076840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01073720
26.3%
2865923
21.2%
-815368
20.0%
1591994
14.5%
9254877
 
6.3%
3102764
 
2.5%
580571
 
2.0%
479204
 
1.9%
674134
 
1.8%
769263
 
1.7%

time
Categorical

HIGH CARDINALITY

Distinct77771
Distinct (%)19.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
16:00:00
 
1122
10:00:00
 
982
15:00:00
 
976
08:00:00
 
976
11:00:00
 
941
Other values (77766)
402687 

Length

Max length19
Median length8
Mean length8.002482315
Min length8

Characters and Unicode

Total characters3262484
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14534 ?
Unique (%)3.6%

Sample

1st row00:15:07
2nd row00:15:16
3rd row00:02:00
4th row00:38:00
5th row01:06:41

Common Values

ValueCountFrequency (%)
16:00:001122
 
0.3%
10:00:00982
 
0.2%
15:00:00976
 
0.2%
08:00:00976
 
0.2%
11:00:00941
 
0.2%
09:00:00936
 
0.2%
22:00:00914
 
0.2%
17:00:00900
 
0.2%
15:30:00817
 
0.2%
07:00:00800
 
0.2%
Other values (77761)398320
97.7%

Length

2023-03-29T02:33:35.950209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
16:00:001122
 
0.3%
10:00:00982
 
0.2%
15:00:00976
 
0.2%
08:00:00976
 
0.2%
11:00:00941
 
0.2%
09:00:00936
 
0.2%
22:00:00914
 
0.2%
17:00:00900
 
0.2%
15:30:00817
 
0.2%
07:00:00800
 
0.2%
Other values (77754)398412
97.7%

Most occurring characters

ValueCountFrequency (%)
:815368
25.0%
0690766
21.2%
1418785
12.8%
2291617
 
8.9%
5228173
 
7.0%
3221424
 
6.8%
4196101
 
6.0%
8104305
 
3.2%
7100548
 
3.1%
9100195
 
3.1%
Other values (3)95202
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2446840
75.0%
Other Punctuation815368
 
25.0%
Dash Punctuation184
 
< 0.1%
Space Separator92
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0690766
28.2%
1418785
17.1%
2291617
11.9%
5228173
 
9.3%
3221424
 
9.0%
4196101
 
8.0%
8104305
 
4.3%
7100548
 
4.1%
9100195
 
4.1%
694926
 
3.9%
Other Punctuation
ValueCountFrequency (%)
:815368
100.0%
Dash Punctuation
ValueCountFrequency (%)
-184
100.0%
Space Separator
ValueCountFrequency (%)
92
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3262484
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
:815368
25.0%
0690766
21.2%
1418785
12.8%
2291617
 
8.9%
5228173
 
7.0%
3221424
 
6.8%
4196101
 
6.0%
8104305
 
3.2%
7100548
 
3.1%
9100195
 
3.1%
Other values (3)95202
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII3262484
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
:815368
25.0%
0690766
21.2%
1418785
12.8%
2291617
 
8.9%
5228173
 
7.0%
3221424
 
6.8%
4196101
 
6.0%
8104305
 
3.2%
7100548
 
3.1%
9100195
 
3.1%
Other values (3)95202
 
2.9%

dur
Real number (ℝ≥0)

HIGH CORRELATION

Distinct337
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.57985597
Minimum1
Maximum1440
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:36.110210image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q110
median15
Q330
95-th percentile120
Maximum1440
Range1439
Interquartile range (IQR)20

Descriptive statistics

Standard deviation49.79122848
Coefficient of variation (CV)1.742179126
Kurtosis182.3002247
Mean28.57985597
Median Absolute Deviation (MAD)6
Skewness9.249568162
Sum11651550
Variance2479.166434
MonotonicityNot monotonic
2023-03-29T02:33:36.270241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1099603
24.4%
1549427
12.1%
542258
10.4%
2041432
10.2%
3028548
 
7.0%
6017833
 
4.4%
813319
 
3.3%
12012718
 
3.1%
612384
 
3.0%
79335
 
2.3%
Other values (327)80827
19.8%
ValueCountFrequency (%)
11035
 
0.3%
22654
 
0.7%
32755
 
0.7%
42327
 
0.6%
542258
10.4%
612384
 
3.0%
79335
 
2.3%
813319
 
3.3%
94340
 
1.1%
1099603
24.4%
ValueCountFrequency (%)
144052
< 0.1%
14221
 
< 0.1%
140024
< 0.1%
13551
 
< 0.1%
13302
 
< 0.1%
13011
 
< 0.1%
13003
 
< 0.1%
12301
 
< 0.1%
12201
 
< 0.1%
12104
 
< 0.1%

is_serv
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
0
363639 
1
44045 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

Length

2023-03-29T02:33:36.414209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:36.534209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

Most occurring characters

ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0363639
89.2%
144045
 
10.8%

assign_key
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.439082721
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:36.638207image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.820493846
Coefficient of variation (CV)1.265037666
Kurtosis16.05460133
Mean1.439082721
Median Absolute Deviation (MAD)0
Skewness4.194914881
Sum586691
Variance3.314197843
MonotonicityNot monotonic
2023-03-29T02:33:36.742208image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1378571
92.9%
1013106
 
3.2%
27104
 
1.7%
93700
 
0.9%
51624
 
0.4%
71247
 
0.3%
6802
 
0.2%
4626
 
0.2%
8535
 
0.1%
3369
 
0.1%
ValueCountFrequency (%)
1378571
92.9%
27104
 
1.7%
3369
 
0.1%
4626
 
0.2%
51624
 
0.4%
6802
 
0.2%
71247
 
0.3%
8535
 
0.1%
93700
 
0.9%
1013106
 
3.2%
ValueCountFrequency (%)
1013106
 
3.2%
93700
 
0.9%
8535
 
0.1%
71247
 
0.3%
6802
 
0.2%
51624
 
0.4%
4626
 
0.2%
3369
 
0.1%
27104
 
1.7%
1378571
92.9%

assign_words
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
Patrol, traffic enforcement, field operations
378571 
Other
 
13106
Gang enforcement
 
7104
Investigative/detective
 
3700
Roadblock or DUI sobriety checkpoint
 
1624
Other values (5)
 
3579

Length

Max length78
Median length45
Mean length42.77467107
Min length5

Characters and Unicode

Total characters17438549
Distinct characters39
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPatrol, traffic enforcement, field operations
2nd rowPatrol, traffic enforcement, field operations
3rd rowPatrol, traffic enforcement, field operations
4th rowPatrol, traffic enforcement, field operations
5th rowPatrol, traffic enforcement, field operations

Common Values

ValueCountFrequency (%)
Patrol, traffic enforcement, field operations378571
92.9%
Other13106
 
3.2%
Gang enforcement7104
 
1.7%
Investigative/detective3700
 
0.9%
Roadblock or DUI sobriety checkpoint1624
 
0.4%
Task force1247
 
0.3%
Narcotics/vice802
 
0.2%
Special events626
 
0.2%
K1-12 public school inlcuding school resource officer or school police officer535
 
0.1%
Compliance check369
 
0.1%

Length

2023-03-29T02:33:36.878241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:37.030240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
enforcement385675
19.8%
patrol378571
19.5%
field378571
19.5%
operations378571
19.5%
traffic378571
19.5%
other13106
 
0.7%
gang7104
 
0.4%
investigative/detective3700
 
0.2%
or2159
 
0.1%
roadblock1624
 
0.1%
Other values (17)15508
 
0.8%

Most occurring characters

ValueCountFrequency (%)
e1956361
11.2%
t1553970
8.9%
r1542466
8.8%
o1537811
8.8%
1535476
8.8%
f1524775
8.7%
n1164414
 
6.7%
i1155870
 
6.6%
a1151185
 
6.6%
c783019
 
4.5%
Other values (29)3533202
20.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14726733
84.4%
Space Separator1535476
 
8.8%
Other Punctuation761644
 
4.4%
Uppercase Letter412556
 
2.4%
Decimal Number1605
 
< 0.1%
Dash Punctuation535
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1956361
13.3%
t1553970
10.6%
r1542466
10.5%
o1537811
10.4%
f1524775
10.4%
n1164414
7.9%
i1155870
7.8%
a1151185
7.8%
c783019
5.3%
l762971
 
5.2%
Other values (11)1593891
10.8%
Uppercase Letter
ValueCountFrequency (%)
P378571
91.8%
O13106
 
3.2%
G7104
 
1.7%
I5324
 
1.3%
R1624
 
0.4%
D1624
 
0.4%
U1624
 
0.4%
T1247
 
0.3%
N802
 
0.2%
S626
 
0.2%
Other values (2)904
 
0.2%
Other Punctuation
ValueCountFrequency (%)
,757142
99.4%
/4502
 
0.6%
Decimal Number
ValueCountFrequency (%)
11070
66.7%
2535
33.3%
Space Separator
ValueCountFrequency (%)
1535476
100.0%
Dash Punctuation
ValueCountFrequency (%)
-535
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin15139289
86.8%
Common2299260
 
13.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1956361
12.9%
t1553970
10.3%
r1542466
10.2%
o1537811
10.2%
f1524775
10.1%
n1164414
7.7%
i1155870
7.6%
a1151185
7.6%
c783019
 
5.2%
l762971
 
5.0%
Other values (23)2006447
13.3%
Common
ValueCountFrequency (%)
1535476
66.8%
,757142
32.9%
/4502
 
0.2%
11070
 
< 0.1%
-535
 
< 0.1%
2535
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII17438549
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1956361
11.2%
t1553970
8.9%
r1542466
8.8%
o1537811
8.8%
1535476
8.8%
f1524775
8.7%
n1164414
 
6.7%
i1155870
 
6.6%
a1151185
 
6.6%
c783019
 
4.5%
Other values (29)3533202
20.3%

inters
Categorical

HIGH CARDINALITY
MISSING

Distinct15939
Distinct (%)39.1%
Missing366868
Missing (%)90.0%
Memory size3.1 MiB
BROADWAY
 
278
MIRAMAR WAY
 
250
CAMINO DE LA PLAZA/ CAMIONES WAY
 
222
OTAY VALLEY ROAD/ AVENIDA DE LAS VISTAS
 
145
G Street
 
137
Other values (15934)
39784 

Length

Max length77
Median length59
Mean length13.92017836
Min length1

Characters and Unicode

Total characters568166
Distinct characters78
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11202 ?
Unique (%)27.4%

Sample

1st rowgovernor dr
2nd rowla jolla village dr
3rd rowmission/hornblend
4th rowhornblend/mission blvd
5th rowclairemont mesa blvd

Common Values

ValueCountFrequency (%)
BROADWAY278
 
0.1%
MIRAMAR WAY250
 
0.1%
CAMINO DE LA PLAZA/ CAMIONES WAY222
 
0.1%
OTAY VALLEY ROAD/ AVENIDA DE LAS VISTAS145
 
< 0.1%
G Street137
 
< 0.1%
imperial129
 
< 0.1%
garnet128
 
< 0.1%
w ash127
 
< 0.1%
MARKET ST108
 
< 0.1%
PB/MB105
 
< 0.1%
Other values (15929)39187
 
9.6%
(Missing)366868
90.0%

Length

2023-03-29T02:33:37.222240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and3985
 
3.7%
st3619
 
3.4%
ave3372
 
3.2%
3273
 
3.1%
street2699
 
2.5%
beach1622
 
1.5%
rd1616
 
1.5%
mission1598
 
1.5%
blvd1541
 
1.4%
dr1353
 
1.3%
Other values (4709)81672
76.8%

Most occurring characters

ValueCountFrequency (%)
65653
 
11.6%
a30420
 
5.4%
e28576
 
5.0%
A27394
 
4.8%
r22026
 
3.9%
E20689
 
3.6%
n19236
 
3.4%
t18707
 
3.3%
R17141
 
3.0%
o16782
 
3.0%
Other values (68)301542
53.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter253618
44.6%
Uppercase Letter218264
38.4%
Space Separator65653
 
11.6%
Decimal Number19086
 
3.4%
Other Punctuation9867
 
1.7%
Dash Punctuation1667
 
0.3%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Math Symbol1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a30420
12.0%
e28576
11.3%
r22026
 
8.7%
n19236
 
7.6%
t18707
 
7.4%
o16782
 
6.6%
i16117
 
6.4%
s14379
 
5.7%
l13637
 
5.4%
d13130
 
5.2%
Other values (16)60608
23.9%
Uppercase Letter
ValueCountFrequency (%)
A27394
12.6%
E20689
 
9.5%
R17141
 
7.9%
S15712
 
7.2%
I15083
 
6.9%
N13468
 
6.2%
O12558
 
5.8%
T11194
 
5.1%
L10226
 
4.7%
C9970
 
4.6%
Other values (16)64829
29.7%
Other Punctuation
ValueCountFrequency (%)
/8725
88.4%
.577
 
5.8%
&194
 
2.0%
@181
 
1.8%
,116
 
1.2%
'59
 
0.6%
!5
 
0.1%
:5
 
0.1%
#3
 
< 0.1%
;1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
55996
31.4%
13466
18.2%
82088
 
10.9%
01848
 
9.7%
61280
 
6.7%
41200
 
6.3%
31120
 
5.9%
2971
 
5.1%
9627
 
3.3%
7490
 
2.6%
Space Separator
ValueCountFrequency (%)
65653
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1667
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Math Symbol
ValueCountFrequency (%)
=1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin471882
83.1%
Common96284
 
16.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a30420
 
6.4%
e28576
 
6.1%
A27394
 
5.8%
r22026
 
4.7%
E20689
 
4.4%
n19236
 
4.1%
t18707
 
4.0%
R17141
 
3.6%
o16782
 
3.6%
i16117
 
3.4%
Other values (42)254794
54.0%
Common
ValueCountFrequency (%)
65653
68.2%
/8725
 
9.1%
55996
 
6.2%
13466
 
3.6%
82088
 
2.2%
01848
 
1.9%
-1667
 
1.7%
61280
 
1.3%
41200
 
1.2%
31120
 
1.2%
Other values (16)3241
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII568166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
65653
 
11.6%
a30420
 
5.4%
e28576
 
5.0%
A27394
 
4.8%
r22026
 
3.9%
E20689
 
3.6%
n19236
 
3.4%
t18707
 
3.3%
R17141
 
3.0%
o16782
 
3.0%
Other values (68)301542
53.1%

block
Real number (ℝ≥0)

MISSING
SKEWED

Distinct307
Distinct (%)0.1%
Missing43330
Missing (%)10.6%
Infinite0
Infinite (%)0.0%
Mean7028.818951
Minimum0
Maximum99999900
Zeros133
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:37.374358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile200
Q11300
median3200
Q34800
95-th percentile9600
Maximum99999900
Range99999900
Interquartile range (IQR)3500

Descriptive statistics

Standard deviation321105.1911
Coefficient of variation (CV)45.68408909
Kurtosis77631.69518
Mean7028.818951
Median Absolute Deviation (MAD)1800
Skewness254.9436976
Sum2560978300
Variance1.031085438 × 1011
MonotonicityNot monotonic
2023-03-29T02:33:37.518392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10011707
 
2.9%
7009976
 
2.4%
30008781
 
2.2%
40008565
 
2.1%
10008233
 
2.0%
5008060
 
2.0%
8007831
 
1.9%
42007434
 
1.8%
43007320
 
1.8%
38007149
 
1.8%
Other values (297)279298
68.5%
(Missing)43330
 
10.6%
ValueCountFrequency (%)
0133
 
< 0.1%
10011707
2.9%
2007056
1.7%
3006640
1.6%
4005851
1.4%
5008060
2.0%
6006673
1.6%
7009976
2.4%
8007831
1.9%
9006361
1.6%
ValueCountFrequency (%)
999999003
 
< 0.1%
180073001
 
< 0.1%
999990070
 
< 0.1%
56009001
 
< 0.1%
999900221
0.1%
5200001
 
< 0.1%
1800001
 
< 0.1%
1540001
 
< 0.1%
1470001
 
< 0.1%
1400001
 
< 0.1%

ldmk
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct36
Distinct (%)87.8%
Missing407643
Missing (%)> 99.9%
Memory size3.1 MiB
15nb exit
i805/43rd St
 
2
North Cove Park Pacific beach
 
2
WESTBOUND STATE ROUTE-94/EUCLID AVENUE
 
1
sr905 / i805
 
1
Other values (31)
31 

Length

Max length41
Median length29
Mean length19.3902439
Min length8

Characters and Unicode

Total characters795
Distinct characters58
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)80.5%

Sample

1st rowsr905 / i805
2nd rowNorth Cove Park Pacific beach
3rd rowNorth Cove Park Pacific beach
4th rowI15 / I8
5th rowON TROLLEY IN SANTEE

Common Values

ValueCountFrequency (%)
15nb exit4
 
< 0.1%
i805/43rd St2
 
< 0.1%
North Cove Park Pacific beach2
 
< 0.1%
WESTBOUND STATE ROUTE-94/EUCLID AVENUE1
 
< 0.1%
sr905 / i8051
 
< 0.1%
CANYONSIDE PARK1
 
< 0.1%
de anza cove1
 
< 0.1%
I-15 SB AT FRIARS1
 
< 0.1%
I15 / I81
 
< 0.1%
STATE ROUTE-163/UNIVERSITY AVENUE1
 
< 0.1%
Other values (26)26
 
< 0.1%
(Missing)407643
> 99.9%

Length

2023-03-29T02:33:37.654359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
at9
 
5.9%
8
 
5.3%
sb6
 
3.9%
park6
 
3.9%
and5
 
3.3%
i-155
 
3.3%
balboa5
 
3.3%
nb4
 
2.6%
exit4
 
2.6%
15nb4
 
2.6%
Other values (66)96
63.2%

Most occurring characters

ValueCountFrequency (%)
111
 
14.0%
A42
 
5.3%
E40
 
5.0%
T32
 
4.0%
R30
 
3.8%
B29
 
3.6%
a29
 
3.6%
I27
 
3.4%
S26
 
3.3%
N25
 
3.1%
Other values (48)404
50.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter361
45.4%
Lowercase Letter218
27.4%
Space Separator111
 
14.0%
Decimal Number80
 
10.1%
Dash Punctuation13
 
1.6%
Other Punctuation12
 
1.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A42
11.6%
E40
11.1%
T32
8.9%
R30
8.3%
B29
 
8.0%
I27
 
7.5%
S26
 
7.2%
N25
 
6.9%
O22
 
6.1%
D14
 
3.9%
Other values (12)74
20.5%
Lowercase Letter
ValueCountFrequency (%)
a29
13.3%
e22
10.1%
n18
 
8.3%
t18
 
8.3%
r17
 
7.8%
o17
 
7.8%
i14
 
6.4%
b12
 
5.5%
c11
 
5.0%
d9
 
4.1%
Other values (12)51
23.4%
Decimal Number
ValueCountFrequency (%)
525
31.2%
116
20.0%
08
 
10.0%
87
 
8.8%
96
 
7.5%
46
 
7.5%
35
 
6.2%
64
 
5.0%
22
 
2.5%
71
 
1.2%
Other Punctuation
ValueCountFrequency (%)
/7
58.3%
@5
41.7%
Space Separator
ValueCountFrequency (%)
111
100.0%
Dash Punctuation
ValueCountFrequency (%)
-13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin579
72.8%
Common216
 
27.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A42
 
7.3%
E40
 
6.9%
T32
 
5.5%
R30
 
5.2%
B29
 
5.0%
a29
 
5.0%
I27
 
4.7%
S26
 
4.5%
N25
 
4.3%
O22
 
3.8%
Other values (34)277
47.8%
Common
ValueCountFrequency (%)
111
51.4%
525
 
11.6%
116
 
7.4%
-13
 
6.0%
08
 
3.7%
87
 
3.2%
/7
 
3.2%
96
 
2.8%
46
 
2.8%
35
 
2.3%
Other values (4)12
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII795
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
111
 
14.0%
A42
 
5.3%
E40
 
5.0%
T32
 
4.0%
R30
 
3.8%
B29
 
3.6%
a29
 
3.6%
I27
 
3.4%
S26
 
3.3%
N25
 
3.1%
Other values (48)404
50.8%

street
Categorical

HIGH CARDINALITY
MISSING

Distinct44668
Distinct (%)11.4%
Missing16834
Missing (%)4.1%
Memory size3.1 MiB
El Cajon Blvd
 
2488
el cajon blvd
 
1577
imperial ave
 
1551
imperial
 
1469
garnet
 
1367
Other values (44663)
382398 

Length

Max length43
Median length36
Mean length10.67243955
Min length1

Characters and Unicode

Total characters4171323
Distinct characters82
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24136 ?
Unique (%)6.2%

Sample

1st rowUNIVERSITY
2nd rowhillside dr
3rd rowocean blvd
4th rowgarnet
5th rowcoronado

Common Values

ValueCountFrequency (%)
El Cajon Blvd2488
 
0.6%
el cajon blvd1577
 
0.4%
imperial ave1551
 
0.4%
imperial1469
 
0.4%
garnet1367
 
0.3%
university ave1270
 
0.3%
University Ave1240
 
0.3%
university1221
 
0.3%
EL CAJON BLVD1123
 
0.3%
commercial1047
 
0.3%
Other values (44658)376497
92.4%
(Missing)16834
 
4.1%

Length

2023-03-29T02:33:37.814395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ave47927
 
6.1%
st39901
 
5.1%
street37969
 
4.9%
blvd26679
 
3.4%
avenue17957
 
2.3%
rd15152
 
1.9%
dr12807
 
1.6%
mission11774
 
1.5%
road10459
 
1.3%
el9610
 
1.2%
Other values (10930)549297
70.5%

Most occurring characters

ValueCountFrequency (%)
389073
 
9.3%
e299210
 
7.2%
a257248
 
6.2%
t210603
 
5.0%
r205794
 
4.9%
n163322
 
3.9%
A155793
 
3.7%
o151343
 
3.6%
i143971
 
3.5%
l135009
 
3.2%
Other values (72)2059957
49.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2348150
56.3%
Uppercase Letter1323188
31.7%
Space Separator389073
 
9.3%
Decimal Number99771
 
2.4%
Other Punctuation9179
 
0.2%
Dash Punctuation1695
 
< 0.1%
Open Punctuation110
 
< 0.1%
Close Punctuation108
 
< 0.1%
Modifier Symbol46
 
< 0.1%
Math Symbol2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e299210
12.7%
a257248
11.0%
t210603
 
9.0%
r205794
 
8.8%
n163322
 
7.0%
o151343
 
6.4%
i143971
 
6.1%
l135009
 
5.7%
s132042
 
5.6%
v100060
 
4.3%
Other values (16)549548
23.4%
Uppercase Letter
ValueCountFrequency (%)
A155793
11.8%
E128833
 
9.7%
R111686
 
8.4%
S109975
 
8.3%
T83749
 
6.3%
N75814
 
5.7%
I72833
 
5.5%
O69409
 
5.2%
L68521
 
5.2%
D62917
 
4.8%
Other values (16)383658
29.0%
Other Punctuation
ValueCountFrequency (%)
.7202
78.5%
/1209
 
13.2%
&330
 
3.6%
#182
 
2.0%
@95
 
1.0%
'70
 
0.8%
,54
 
0.6%
:22
 
0.2%
;11
 
0.1%
\2
 
< 0.1%
Other values (2)2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
119805
19.9%
515134
15.2%
414014
14.0%
310710
10.7%
09419
9.4%
69332
9.4%
76912
 
6.9%
26662
 
6.7%
84663
 
4.7%
93120
 
3.1%
Open Punctuation
ValueCountFrequency (%)
(108
98.2%
[2
 
1.8%
Space Separator
ValueCountFrequency (%)
389073
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1695
100.0%
Close Punctuation
ValueCountFrequency (%)
)108
100.0%
Modifier Symbol
ValueCountFrequency (%)
`46
100.0%
Math Symbol
ValueCountFrequency (%)
=2
100.0%
Currency Symbol
ValueCountFrequency (%)
$1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3671338
88.0%
Common499985
 
12.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e299210
 
8.1%
a257248
 
7.0%
t210603
 
5.7%
r205794
 
5.6%
n163322
 
4.4%
A155793
 
4.2%
o151343
 
4.1%
i143971
 
3.9%
l135009
 
3.7%
s132042
 
3.6%
Other values (42)1817003
49.5%
Common
ValueCountFrequency (%)
389073
77.8%
119805
 
4.0%
515134
 
3.0%
414014
 
2.8%
310710
 
2.1%
09419
 
1.9%
69332
 
1.9%
.7202
 
1.4%
76912
 
1.4%
26662
 
1.3%
Other values (20)11722
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII4171323
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
389073
 
9.3%
e299210
 
7.2%
a257248
 
6.2%
t210603
 
5.0%
r205794
 
4.9%
n163322
 
3.9%
A155793
 
3.7%
o151343
 
3.6%
i143971
 
3.5%
l135009
 
3.2%
Other values (72)2059957
49.4%

hw_exit
Categorical

HIGH CARDINALITY
MISSING

Distinct2211
Distinct (%)72.1%
Missing404618
Missing (%)99.2%
Memory size3.1 MiB
NB I-15
 
36
I-805/PLAZA BOULEVARD
 
31
I-805/SR-54
 
29
SR 905
 
28
SB I-15
 
26
Other values (2206)
2916 

Length

Max length60
Median length44
Mean length18.40378343
Min length2

Characters and Unicode

Total characters56426
Distinct characters74
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1907 ?
Unique (%)62.2%

Sample

1st rown/b 5 @ sea world
2nd rowI15NB @ AERO DR
3rd rowwB 8 @ WARING
4th row15 AT MIRAMAR
5th row15 AT 163

Common Values

ValueCountFrequency (%)
NB I-1536
 
< 0.1%
I-805/PLAZA BOULEVARD31
 
< 0.1%
I-805/SR-5429
 
< 0.1%
SR 90528
 
< 0.1%
SB I-1526
 
< 0.1%
I-805/43RD STREET23
 
< 0.1%
I-805/H STREET19
 
< 0.1%
NB 805 AT SR-16318
 
< 0.1%
I-805/IMPERIAL AVENUE14
 
< 0.1%
NB 805 AT MURRAY RIDGE ROAD14
 
< 0.1%
Other values (2201)2828
 
0.7%
(Missing)404618
99.2%

Length

2023-03-29T02:33:38.798389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
at960
 
8.2%
15588
 
5.0%
sb515
 
4.4%
500
 
4.3%
nb440
 
3.8%
805300
 
2.6%
i-15266
 
2.3%
street229
 
2.0%
and206
 
1.8%
road185
 
1.6%
Other values (803)7487
64.1%

Most occurring characters

ValueCountFrequency (%)
8622
 
15.3%
52773
 
4.9%
A2373
 
4.2%
E2281
 
4.0%
T2227
 
3.9%
a2185
 
3.9%
R2070
 
3.7%
I1803
 
3.2%
S1730
 
3.1%
N1479
 
2.6%
Other values (64)28883
51.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter21601
38.3%
Lowercase Letter15708
27.8%
Space Separator8622
 
15.3%
Decimal Number7684
 
13.6%
Other Punctuation1495
 
2.6%
Dash Punctuation1304
 
2.3%
Open Punctuation5
 
< 0.1%
Close Punctuation5
 
< 0.1%
Math Symbol1
 
< 0.1%
Connector Punctuation1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A2373
11.0%
E2281
10.6%
T2227
10.3%
R2070
9.6%
I1803
8.3%
S1730
8.0%
N1479
 
6.8%
O1256
 
5.8%
B1235
 
5.7%
D823
 
3.8%
Other values (16)4324
20.0%
Lowercase Letter
ValueCountFrequency (%)
a2185
13.9%
r1442
 
9.2%
e1414
 
9.0%
t1414
 
9.0%
n1144
 
7.3%
s1065
 
6.8%
o1004
 
6.4%
b963
 
6.1%
i759
 
4.8%
l658
 
4.2%
Other values (16)3660
23.3%
Decimal Number
ValueCountFrequency (%)
52773
36.1%
11436
18.7%
81091
 
14.2%
0834
 
10.9%
6401
 
5.2%
4298
 
3.9%
3287
 
3.7%
9277
 
3.6%
2186
 
2.4%
7101
 
1.3%
Other Punctuation
ValueCountFrequency (%)
/1091
73.0%
@344
 
23.0%
,29
 
1.9%
.16
 
1.1%
&14
 
0.9%
!1
 
0.1%
Space Separator
ValueCountFrequency (%)
8622
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1304
100.0%
Open Punctuation
ValueCountFrequency (%)
(5
100.0%
Close Punctuation
ValueCountFrequency (%)
)5
100.0%
Math Symbol
ValueCountFrequency (%)
=1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin37309
66.1%
Common19117
33.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
A2373
 
6.4%
E2281
 
6.1%
T2227
 
6.0%
a2185
 
5.9%
R2070
 
5.5%
I1803
 
4.8%
S1730
 
4.6%
N1479
 
4.0%
r1442
 
3.9%
e1414
 
3.8%
Other values (42)18305
49.1%
Common
ValueCountFrequency (%)
8622
45.1%
52773
 
14.5%
11436
 
7.5%
-1304
 
6.8%
/1091
 
5.7%
81091
 
5.7%
0834
 
4.4%
6401
 
2.1%
@344
 
1.8%
4298
 
1.6%
Other values (12)923
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII56426
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8622
 
15.3%
52773
 
4.9%
A2373
 
4.2%
E2281
 
4.0%
T2227
 
3.9%
a2185
 
3.9%
R2070
 
3.7%
I1803
 
3.2%
S1730
 
3.1%
N1479
 
2.6%
Other values (64)28883
51.2%

is_school
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
0
407362 
1
 
322

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

Length

2023-03-29T02:33:38.934393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:39.038393image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0407362
99.9%
1322
 
0.1%

school_name
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct85
Distinct (%)26.4%
Missing407362
Missing (%)99.9%
Memory size3.1 MiB
Ibarra Elementary (San Diego Unified) 37683380108290
27 
Rancho Bernardo High (Poway Unified) 37682963730819
 
16
Del Norte High (Poway Unified) 37682960118935
 
15
Serra High (San Diego Unified) 37683383730173
 
13
Montgomery Senior High (Sweetwater Union High) 37684113738234
 
13
Other values (80)
238 

Length

Max length69
Median length66
Mean length52.7515528
Min length34

Characters and Unicode

Total characters16986
Distinct characters65
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)9.9%

Sample

1st rowGarfield Elementary (San Diego Unified) 37683386039655
2nd rowGarfield Elementary (San Diego Unified) 37683386039655
3rd rowGarfield Elementary (San Diego Unified) 37683386039655
4th rowGarfield Elementary (San Diego Unified) 37683386039655
5th rowGrant K-8 (San Diego Unified) 37683386039671

Common Values

ValueCountFrequency (%)
Ibarra Elementary (San Diego Unified) 3768338010829027
 
< 0.1%
Rancho Bernardo High (Poway Unified) 3768296373081916
 
< 0.1%
Del Norte High (Poway Unified) 3768296011893515
 
< 0.1%
Serra High (San Diego Unified) 3768338373017313
 
< 0.1%
Montgomery Senior High (Sweetwater Union High) 3768411373823413
 
< 0.1%
The O'Farrell Charter (San Diego Unified) 3768338606196413
 
< 0.1%
De Portola Middle (San Diego Unified) 3768338610618113
 
< 0.1%
Torrey Pines High (San Dieguito Union High) 3768346373003311
 
< 0.1%
Sunset Elementary (San Ysidro Elementary) 3768379609326411
 
< 0.1%
San Ysidro High (Sweetwater Union High) 376841137315029
 
< 0.1%
Other values (75)181
 
< 0.1%
(Missing)407362
99.9%

Length

2023-03-29T02:33:39.150390image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san222
 
10.8%
unified221
 
10.7%
high165
 
8.0%
diego149
 
7.2%
elementary144
 
7.0%
poway72
 
3.5%
union62
 
3.0%
middle50
 
2.4%
ysidro48
 
2.3%
sweetwater31
 
1.5%
Other values (200)896
43.5%

Most occurring characters

ValueCountFrequency (%)
1738
 
10.2%
e1193
 
7.0%
i1088
 
6.4%
31054
 
6.2%
n915
 
5.4%
a813
 
4.8%
6688
 
4.1%
8636
 
3.7%
r616
 
3.6%
o589
 
3.5%
Other values (55)7656
45.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8285
48.8%
Decimal Number4515
26.6%
Uppercase Letter1774
 
10.4%
Space Separator1738
 
10.2%
Close Punctuation322
 
1.9%
Open Punctuation322
 
1.9%
Other Punctuation23
 
0.1%
Dash Punctuation7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1193
14.4%
i1088
13.1%
n915
11.0%
a813
9.8%
r616
7.4%
o589
 
7.1%
d431
 
5.2%
t402
 
4.9%
g375
 
4.5%
l345
 
4.2%
Other values (14)1518
18.3%
Uppercase Letter
ValueCountFrequency (%)
S331
18.7%
U285
16.1%
D214
12.1%
H179
10.1%
E149
8.4%
P113
 
6.4%
M102
 
5.7%
C90
 
5.1%
Y48
 
2.7%
B46
 
2.6%
Other values (14)217
12.2%
Decimal Number
ValueCountFrequency (%)
31054
23.3%
6688
15.2%
8636
14.1%
7542
12.0%
0467
10.3%
1377
 
8.3%
9280
 
6.2%
2206
 
4.6%
4147
 
3.3%
5118
 
2.6%
Other Punctuation
ValueCountFrequency (%)
'13
56.5%
.9
39.1%
/1
 
4.3%
Space Separator
ValueCountFrequency (%)
1738
100.0%
Close Punctuation
ValueCountFrequency (%)
)322
100.0%
Open Punctuation
ValueCountFrequency (%)
(322
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10059
59.2%
Common6927
40.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1193
 
11.9%
i1088
 
10.8%
n915
 
9.1%
a813
 
8.1%
r616
 
6.1%
o589
 
5.9%
d431
 
4.3%
t402
 
4.0%
g375
 
3.7%
l345
 
3.4%
Other values (38)3292
32.7%
Common
ValueCountFrequency (%)
1738
25.1%
31054
15.2%
6688
 
9.9%
8636
 
9.2%
7542
 
7.8%
0467
 
6.7%
1377
 
5.4%
)322
 
4.6%
(322
 
4.6%
9280
 
4.0%
Other values (7)501
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII16986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1738
 
10.2%
e1193
 
7.0%
i1088
 
6.4%
31054
 
6.2%
n915
 
5.4%
a813
 
4.8%
6688
 
4.1%
8636
 
3.7%
r616
 
3.6%
o589
 
3.5%
Other values (55)7656
45.1%

city
Categorical

HIGH CORRELATION

Distinct46
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
SAN DIEGO
400853 
SAN YSIDRO
 
2072
CHULA VISTA
 
1033
NATIONAL CITY
 
783
EL CAJON
 
415
Other values (41)
 
2528

Length

Max length36
Median length9
Mean length9.017876591
Min length4

Characters and Unicode

Total characters3676444
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowSAN DIEGO
2nd rowLA JOLLA
3rd rowSAN DIEGO
4th rowSAN DIEGO
5th rowSAN DIEGO

Common Values

ValueCountFrequency (%)
SAN DIEGO400853
98.3%
SAN YSIDRO2072
 
0.5%
CHULA VISTA1033
 
0.3%
NATIONAL CITY783
 
0.2%
EL CAJON415
 
0.1%
ESCONDIDO398
 
0.1%
LA MESA310
 
0.1%
LA JOLLA287
 
0.1%
LEMON GROVE282
 
0.1%
SANTEE209
 
0.1%
Other values (36)1042
 
0.3%

Length

2023-03-29T02:33:39.286394image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
san403140
49.5%
diego400856
49.2%
ysidro2072
 
0.3%
vista1034
 
0.1%
chula1033
 
0.1%
national783
 
0.1%
city783
 
0.1%
la597
 
0.1%
el415
 
0.1%
cajon415
 
0.1%
Other values (49)3319
 
0.4%

Most occurring characters

ValueCountFrequency (%)
A409647
11.1%
S407720
11.1%
406763
11.1%
I406683
11.1%
O406433
11.1%
N406432
11.1%
D403934
11.0%
E403924
11.0%
G401346
10.9%
L4731
 
0.1%
Other values (15)18831
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3269678
88.9%
Space Separator406763
 
11.1%
Dash Punctuation3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A409647
12.5%
S407720
12.5%
I406683
12.4%
O406433
12.4%
N406432
12.4%
D403934
12.4%
E403924
12.4%
G401346
12.3%
L4731
 
0.1%
Y3255
 
0.1%
Other values (13)15573
 
0.5%
Space Separator
ValueCountFrequency (%)
406763
100.0%
Dash Punctuation
ValueCountFrequency (%)
-3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3269678
88.9%
Common406766
 
11.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A409647
12.5%
S407720
12.5%
I406683
12.4%
O406433
12.4%
N406432
12.4%
D403934
12.4%
E403924
12.4%
G401346
12.3%
L4731
 
0.1%
Y3255
 
0.1%
Other values (13)15573
 
0.5%
Common
ValueCountFrequency (%)
406763
> 99.9%
-3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3676444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A409647
11.1%
S407720
11.1%
406763
11.1%
I406683
11.1%
O406433
11.1%
N406432
11.1%
D403934
11.0%
E403924
11.0%
G401346
10.9%
L4731
 
0.1%
Other values (15)18831
 
0.5%

beat
Real number (ℝ≥0)

HIGH CORRELATION

Distinct126
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean511.4569667
Minimum111
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:39.422389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum111
5-th percentile121
Q1315
median521
Q3628
95-th percentile931
Maximum999
Range888
Interquartile range (IQR)313

Descriptive statistics

Standard deviation242.5767683
Coefficient of variation (CV)0.4742857837
Kurtosis-0.7738950676
Mean511.4569667
Median Absolute Deviation (MAD)194
Skewness-0.07003374112
Sum208512822
Variance58843.48851
MonotonicityNot monotonic
2023-03-29T02:33:39.550387image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
52129982
 
7.4%
12224592
 
6.0%
61116419
 
4.0%
61410569
 
2.6%
52410192
 
2.5%
7129851
 
2.4%
5129651
 
2.4%
8139582
 
2.4%
9999483
 
2.3%
1218744
 
2.1%
Other values (116)268619
65.9%
ValueCountFrequency (%)
1114467
 
1.1%
1121542
 
0.4%
1131514
 
0.4%
1143534
 
0.9%
1154463
 
1.1%
1163624
 
0.9%
1218744
 
2.1%
12224592
6.0%
1234779
 
1.2%
1244479
 
1.1%
ValueCountFrequency (%)
9999483
2.3%
9371047
 
0.3%
936393
 
0.1%
935691
 
0.2%
9345119
1.3%
9331784
 
0.4%
932481
 
0.1%
9314407
1.1%
841715
 
0.2%
8391133
 
0.3%

beat_name
Categorical

HIGH CARDINALITY

Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
East Village 521
 
29982
Pacific Beach 122
 
24592
Midway District 611
 
16419
Ocean Beach 614
 
10569
Core-Columbia 524
 
10192
Other values (122)
315930 

Length

Max length25
Median length22
Mean length15.90947646
Min length10

Characters and Unicode

Total characters6486039
Distinct characters64
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCherokee Point 839
2nd rowLa Jolla 124
3rd rowPacific Beach 122
4th rowPacific Beach 122
5th rowOcean Beach 614

Common Values

ValueCountFrequency (%)
East Village 52129982
 
7.4%
Pacific Beach 12224592
 
6.0%
Midway District 61116419
 
4.0%
Ocean Beach 61410569
 
2.6%
Core-Columbia 52410192
 
2.5%
San Ysidro 7129851
 
2.4%
Logan Heights 5129651
 
2.4%
North Park 8139582
 
2.4%
Unknown 9999411
 
2.3%
Mission Beach 1218744
 
2.1%
Other values (117)268691
65.9%

Length

2023-03-29T02:33:39.686392image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
east48357
 
4.1%
beach43905
 
3.7%
park41321
 
3.5%
mesa35028
 
3.0%
village32954
 
2.8%
52129982
 
2.5%
mission26001
 
2.2%
pacific24592
 
2.1%
12224592
 
2.1%
heights21711
 
1.8%
Other values (263)851684
72.2%

Most occurring characters

ValueCountFrequency (%)
772443
 
11.9%
a565930
 
8.7%
e377516
 
5.8%
i375627
 
5.8%
1309397
 
4.8%
l292727
 
4.5%
r273757
 
4.2%
2272499
 
4.2%
s268438
 
4.1%
t259143
 
4.0%
Other values (54)2718562
41.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3682859
56.8%
Decimal Number1223052
 
18.9%
Uppercase Letter789444
 
12.2%
Space Separator772443
 
11.9%
Dash Punctuation10192
 
0.2%
Other Punctuation8049
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a565930
15.4%
e377516
10.3%
i375627
10.2%
l292727
7.9%
r273757
7.4%
s268438
7.3%
t259143
7.0%
o251522
6.8%
n233816
 
6.3%
c157915
 
4.3%
Other values (16)626468
17.0%
Uppercase Letter
ValueCountFrequency (%)
M109569
13.9%
P80851
10.2%
B75772
9.6%
C72612
9.2%
V69291
8.8%
E55874
 
7.1%
H46169
 
5.8%
S41954
 
5.3%
L40574
 
5.1%
O27049
 
3.4%
Other values (14)169729
21.5%
Decimal Number
ValueCountFrequency (%)
1309397
25.3%
2272499
22.3%
3153382
12.5%
4127141
10.4%
5123115
 
10.1%
683727
 
6.8%
858189
 
4.8%
749152
 
4.0%
946450
 
3.8%
Other Punctuation
ValueCountFrequency (%)
/5351
66.5%
.1816
 
22.6%
'882
 
11.0%
Space Separator
ValueCountFrequency (%)
772443
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10192
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4472303
69.0%
Common2013736
31.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a565930
 
12.7%
e377516
 
8.4%
i375627
 
8.4%
l292727
 
6.5%
r273757
 
6.1%
s268438
 
6.0%
t259143
 
5.8%
o251522
 
5.6%
n233816
 
5.2%
c157915
 
3.5%
Other values (40)1415912
31.7%
Common
ValueCountFrequency (%)
772443
38.4%
1309397
15.4%
2272499
 
13.5%
3153382
 
7.6%
4127141
 
6.3%
5123115
 
6.1%
683727
 
4.2%
858189
 
2.9%
749152
 
2.4%
946450
 
2.3%
Other values (4)18241
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII6486039
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
772443
 
11.9%
a565930
 
8.7%
e377516
 
5.8%
i375627
 
5.8%
1309397
 
4.8%
l292727
 
4.5%
r273757
 
4.2%
2272499
 
4.2%
s268438
 
4.1%
t259143
 
4.0%
Other values (54)2718562
41.9%

is_student
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
0
407522 
1
 
162

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

Length

2023-03-29T02:33:39.814358image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:39.918388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0407522
> 99.9%
1162
 
< 0.1%

lim_eng
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
0
399890 
1
 
7794

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

Length

2023-03-29T02:33:40.014374image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:40.142768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

Most occurring characters

ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0399890
98.1%
17794
 
1.9%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct102
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.3154919
Minimum1
Maximum120
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:40.262770image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile20
Q126
median35
Q346
95-th percentile60
Maximum120
Range119
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.43056039
Coefficient of variation (CV)0.359919157
Kurtosis-0.2227158246
Mean37.3154919
Median Absolute Deviation (MAD)10
Skewness0.5885595043
Sum15212929
Variance180.3799524
MonotonicityNot monotonic
2023-03-29T02:33:40.422765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3058428
14.3%
4042673
 
10.5%
2540305
 
9.9%
5035877
 
8.8%
3533339
 
8.2%
4524881
 
6.1%
6020837
 
5.1%
2020273
 
5.0%
5516130
 
4.0%
216938
 
1.7%
Other values (92)108003
26.5%
ValueCountFrequency (%)
112
 
< 0.1%
25
 
< 0.1%
33
 
< 0.1%
413
 
< 0.1%
538
 
< 0.1%
613
 
< 0.1%
737
 
< 0.1%
871
 
< 0.1%
943
 
< 0.1%
10320
0.1%
ValueCountFrequency (%)
1203
 
< 0.1%
1161
 
< 0.1%
10015
< 0.1%
9918
< 0.1%
983
 
< 0.1%
974
 
< 0.1%
961
 
< 0.1%
9520
< 0.1%
946
 
< 0.1%
937
 
< 0.1%

gender_words
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing90
Missing (%)< 0.1%
Memory size3.1 MiB
Male
297732 
Female
108822 
Transgender man/boy
 
548
Transgender woman/girl
 
492

Length

Max length22
Median length4
Mean length4.575867162
Min length4

Characters and Unicode

Total characters1865096
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowFemale
3rd rowFemale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male297732
73.0%
Female108822
 
26.7%
Transgender man/boy548
 
0.1%
Transgender woman/girl492
 
0.1%
(Missing)90
 
< 0.1%

Length

2023-03-29T02:33:40.566734image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:40.702767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male297732
72.9%
female108822
 
26.6%
transgender1040
 
0.3%
man/boy548
 
0.1%
woman/girl492
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e517456
27.7%
a408634
21.9%
l407046
21.8%
M297732
16.0%
m109862
 
5.9%
F108822
 
5.8%
n3120
 
0.2%
r2572
 
0.1%
g1532
 
0.1%
1040
 
0.1%
Other values (9)7280
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1455422
78.0%
Uppercase Letter407594
 
21.9%
Space Separator1040
 
0.1%
Other Punctuation1040
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e517456
35.6%
a408634
28.1%
l407046
28.0%
m109862
 
7.5%
n3120
 
0.2%
r2572
 
0.2%
g1532
 
0.1%
o1040
 
0.1%
s1040
 
0.1%
d1040
 
0.1%
Other values (4)2080
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
M297732
73.0%
F108822
 
26.7%
T1040
 
0.3%
Space Separator
ValueCountFrequency (%)
1040
100.0%
Other Punctuation
ValueCountFrequency (%)
/1040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1863016
99.9%
Common2080
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e517456
27.8%
a408634
21.9%
l407046
21.8%
M297732
16.0%
m109862
 
5.9%
F108822
 
5.8%
n3120
 
0.2%
r2572
 
0.1%
g1532
 
0.1%
o1040
 
0.1%
Other values (7)5200
 
0.3%
Common
ValueCountFrequency (%)
1040
50.0%
/1040
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1865096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e517456
27.7%
a408634
21.9%
l407046
21.8%
M297732
16.0%
m109862
 
5.9%
F108822
 
5.8%
n3120
 
0.2%
r2572
 
0.1%
g1532
 
0.1%
1040
 
0.1%
Other values (9)7280
 
0.4%

is_gendnc
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
0
407507 
1
 
177

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

Length

2023-03-29T02:33:40.822768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:40.950735image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0407507
> 99.9%
1177
 
< 0.1%

gender_code
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
1
297732 
2
108822 
3
 
548
4
 
492
0
 
90

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters407684
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

Length

2023-03-29T02:33:41.046767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:41.174768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407684
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common407684
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII407684
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1297732
73.0%
2108822
 
26.7%
3548
 
0.1%
4492
 
0.1%
090
 
< 0.1%

gendnc_code
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)0.6%
Missing407507
Missing (%)> 99.9%
Memory size3.1 MiB
5.0
177 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters531
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row5.0
3rd row5.0
4th row5.0
5th row5.0

Common Values

ValueCountFrequency (%)
5.0177
 
< 0.1%
(Missing)407507
> 99.9%

Length

2023-03-29T02:33:41.286768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:41.390765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
5.0177
100.0%

Most occurring characters

ValueCountFrequency (%)
5177
33.3%
.177
33.3%
0177
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number354
66.7%
Other Punctuation177
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5177
50.0%
0177
50.0%
Other Punctuation
ValueCountFrequency (%)
.177
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common531
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5177
33.3%
.177
33.3%
0177
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII531
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5177
33.3%
.177
33.3%
0177
33.3%

lgbt
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size398.3 KiB
False
396960 
True
 
10724
ValueCountFrequency (%)
False396960
97.4%
True10724
 
2.6%
2023-03-29T02:33:41.494768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

race
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
white
170777 
hisp
119669 
black
82052 
asian
31260 
nhopi
 
3112

Length

Max length5
Median length5
Mean length4.704469638
Min length4

Characters and Unicode

Total characters1917937
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhisp
2nd rowwhite
3rd rowwhite
4th rowhisp
5th rowblack

Common Values

ValueCountFrequency (%)
white170777
41.9%
hisp119669
29.4%
black82052
20.1%
asian31260
 
7.7%
nhopi3112
 
0.8%
aian814
 
0.2%

Length

2023-03-29T02:33:41.598765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:41.750733image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
white170777
41.9%
hisp119669
29.4%
black82052
20.1%
asian31260
 
7.7%
nhopi3112
 
0.8%
aian814
 
0.2%

Most occurring characters

ValueCountFrequency (%)
i325632
17.0%
h293558
15.3%
w170777
8.9%
t170777
8.9%
e170777
8.9%
s150929
7.9%
a146200
7.6%
p122781
 
6.4%
b82052
 
4.3%
l82052
 
4.3%
Other values (4)202402
10.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1917937
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i325632
17.0%
h293558
15.3%
w170777
8.9%
t170777
8.9%
e170777
8.9%
s150929
7.9%
a146200
7.6%
p122781
 
6.4%
b82052
 
4.3%
l82052
 
4.3%
Other values (4)202402
10.6%

Most occurring scripts

ValueCountFrequency (%)
Latin1917937
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i325632
17.0%
h293558
15.3%
w170777
8.9%
t170777
8.9%
e170777
8.9%
s150929
7.9%
a146200
7.6%
p122781
 
6.4%
b82052
 
4.3%
l82052
 
4.3%
Other values (4)202402
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1917937
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i325632
17.0%
h293558
15.3%
w170777
8.9%
t170777
8.9%
e170777
8.9%
s150929
7.9%
a146200
7.6%
p122781
 
6.4%
b82052
 
4.3%
l82052
 
4.3%
Other values (4)202402
10.6%

disability
Categorical

HIGH CARDINALITY

Distinct134
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
None
388791 
Mental health condition
 
13848
Other disability
 
1776
Intellectual or developmental disability, including dementia
 
626
Speech impairment or limited use of language
 
558
Other values (129)
 
2085

Length

Max length201
Median length4
Mean length5.12283288
Min length4

Characters and Unicode

Total characters2088497
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique65 ?
Unique (%)< 0.1%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None388791
95.4%
Mental health condition13848
 
3.4%
Other disability1776
 
0.4%
Intellectual or developmental disability, including dementia626
 
0.2%
Speech impairment or limited use of language558
 
0.1%
Deafness or difficulty hearing461
 
0.1%
Intellectual or developmental disability, including dementia|Mental health condition285
 
0.1%
Blind or limited vision268
 
0.1%
Mental health condition|Intellectual or developmental disability, including dementia241
 
0.1%
Mental health condition|Other disability124
 
< 0.1%
Other values (124)706
 
0.2%

Length

2023-03-29T02:33:41.918766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none388791
85.4%
health14978
 
3.3%
condition14399
 
3.2%
mental14372
 
3.2%
disability3359
 
0.7%
or3355
 
0.7%
other1954
 
0.4%
developmental1387
 
0.3%
including1387
 
0.3%
limited1275
 
0.3%
Other values (50)10082
 
2.2%

Most occurring characters

ValueCountFrequency (%)
n444578
21.3%
e438417
21.0%
o424782
20.3%
N388791
18.6%
t59149
 
2.8%
i52496
 
2.5%
47655
 
2.3%
l45125
 
2.2%
a41741
 
2.0%
h33735
 
1.6%
Other values (20)112028
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1628493
78.0%
Uppercase Letter409323
 
19.6%
Space Separator47655
 
2.3%
Math Symbol1639
 
0.1%
Other Punctuation1387
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n444578
27.3%
e438417
26.9%
o424782
26.1%
t59149
 
3.6%
i52496
 
3.2%
l45125
 
2.8%
a41741
 
2.6%
h33735
 
2.1%
d25090
 
1.5%
c19323
 
1.2%
Other values (10)44057
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
N388791
95.0%
M14978
 
3.7%
O2199
 
0.5%
I1387
 
0.3%
S878
 
0.2%
D693
 
0.2%
B397
 
0.1%
Space Separator
ValueCountFrequency (%)
47655
100.0%
Math Symbol
ValueCountFrequency (%)
|1639
100.0%
Other Punctuation
ValueCountFrequency (%)
,1387
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2037816
97.6%
Common50681
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
n444578
21.8%
e438417
21.5%
o424782
20.8%
N388791
19.1%
t59149
 
2.9%
i52496
 
2.6%
l45125
 
2.2%
a41741
 
2.0%
h33735
 
1.7%
d25090
 
1.2%
Other values (17)83912
 
4.1%
Common
ValueCountFrequency (%)
47655
94.0%
|1639
 
3.2%
,1387
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2088497
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n444578
21.3%
e438417
21.0%
o424782
20.3%
N388791
18.6%
t59149
 
2.8%
i52496
 
2.5%
47655
 
2.3%
l45125
 
2.2%
a41741
 
2.0%
h33735
 
1.6%
Other values (20)112028
 
5.4%

reason_words
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Memory size3.1 MiB
Reasonable Suspicion
213781 
Traffic Violation
175061 
Investigation to determine whether the person was truant
 
5358
Known to be on Parole / Probation / PRCS / Mandatory Supervision
 
5126
Consensual Encounter resulting in a search
 
4418
Other values (3)
 
3935

Length

Max length113
Median length20
Mean length20.29560757
Min length17

Characters and Unicode

Total characters8274093
Distinct characters43
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTraffic Violation
2nd rowReasonable Suspicion
3rd rowReasonable Suspicion
4th rowReasonable Suspicion
5th rowReasonable Suspicion

Common Values

ValueCountFrequency (%)
Reasonable Suspicion213781
52.4%
Traffic Violation175061
42.9%
Investigation to determine whether the person was truant5358
 
1.3%
Known to be on Parole / Probation / PRCS / Mandatory Supervision5126
 
1.3%
Consensual Encounter resulting in a search4418
 
1.1%
Knowledge of outstanding arrest warrant/wanted person3904
 
1.0%
Determine whether the student violated school policy27
 
< 0.1%
Possible conduct warranting discipline under Education Code sections 48900, 48900.2, 48900.3, 48900.4 and 48900.74
 
< 0.1%
(Missing)5
 
< 0.1%

Length

2023-03-29T02:33:42.054767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-29T02:33:42.190759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
reasonable213781
22.9%
suspicion213781
22.9%
traffic175061
18.8%
violation175061
18.8%
15378
 
1.6%
to10484
 
1.1%
person9262
 
1.0%
determine5385
 
0.6%
whether5385
 
0.6%
the5385
 
0.6%
Other values (40)103274
11.1%

Most occurring characters

ValueCountFrequency (%)
i997046
12.1%
o859346
 
10.4%
a847079
 
10.2%
n710187
 
8.6%
524558
 
6.3%
e523232
 
6.3%
s478220
 
5.8%
l406797
 
4.9%
c397752
 
4.8%
f354026
 
4.3%
Other values (33)2175850
26.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6888154
83.2%
Uppercase Letter841955
 
10.2%
Space Separator524558
 
6.3%
Other Punctuation19310
 
0.2%
Decimal Number116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i997046
14.5%
o859346
12.5%
a847079
12.3%
n710187
10.3%
e523232
7.6%
s478220
6.9%
l406797
 
5.9%
c397752
 
5.8%
f354026
 
5.1%
t261837
 
3.8%
Other values (11)1052632
15.3%
Uppercase Letter
ValueCountFrequency (%)
S224033
26.6%
R218907
26.0%
V175061
20.8%
T175061
20.8%
P15382
 
1.8%
C9548
 
1.1%
K9030
 
1.1%
I5358
 
0.6%
M5126
 
0.6%
E4422
 
0.5%
Decimal Number
ValueCountFrequency (%)
040
34.5%
424
20.7%
820
17.2%
920
17.2%
24
 
3.4%
34
 
3.4%
74
 
3.4%
Other Punctuation
ValueCountFrequency (%)
/19282
99.9%
.16
 
0.1%
,12
 
0.1%
Space Separator
ValueCountFrequency (%)
524558
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7730109
93.4%
Common543984
 
6.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i997046
12.9%
o859346
11.1%
a847079
11.0%
n710187
 
9.2%
e523232
 
6.8%
s478220
 
6.2%
l406797
 
5.3%
c397752
 
5.1%
f354026
 
4.6%
t261837
 
3.4%
Other values (22)1894587
24.5%
Common
ValueCountFrequency (%)
524558
96.4%
/19282
 
3.5%
040
 
< 0.1%
424
 
< 0.1%
820
 
< 0.1%
920
 
< 0.1%
.16
 
< 0.1%
,12
 
< 0.1%
24
 
< 0.1%
34
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8274093
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i997046
12.1%
o859346
 
10.4%
a847079
 
10.2%
n710187
 
8.6%
524558
 
6.3%
e523232
 
6.3%
s478220
 
5.8%
l406797
 
4.9%
c397752
 
4.8%
f354026
 
4.3%
Other values (33)2175850
26.3%

reasonid
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct1693
Distinct (%)0.4%
Missing18844
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean51661.96158
Minimum3
Maximum99999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB
2023-03-29T02:33:42.574733image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile22004
Q141063
median54153
Q354655
95-th percentile99990
Maximum99999
Range99996
Interquartile range (IQR)13592

Descriptive statistics

Standard deviation17956.08335
Coefficient of variation (CV)0.3475687488
Kurtosis1.502584076
Mean51661.96158
Median Absolute Deviation (MAD)2017
Skewness0.4606850036
Sum2.008823714 × 1010
Variance322420929.2
MonotonicityNot monotonic
2023-03-29T02:33:42.814767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6500228446
 
7.0%
3202218215
 
4.5%
9999017159
 
4.2%
3211116823
 
4.1%
5416716209
 
4.0%
6500014655
 
3.6%
5410614147
 
3.5%
5465511917
 
2.9%
410639946
 
2.4%
541469512
 
2.3%
Other values (1683)231811
56.9%
(Missing)18844
 
4.6%
ValueCountFrequency (%)
327
< 0.1%
30651
 
< 0.1%
30681
 
< 0.1%
40216
 
< 0.1%
402267
< 0.1%
40231
 
< 0.1%
40261
 
< 0.1%
40281
 
< 0.1%
40315
 
< 0.1%
40324
 
< 0.1%
ValueCountFrequency (%)
999995911
 
1.4%
9999017159
4.2%
891054
 
< 0.1%
890059
 
< 0.1%
662183
 
< 0.1%
6621182
 
< 0.1%
66210204
 
0.1%
662081646
 
0.4%
66207169
 
< 0.1%
662066
 
< 0.1%

reason_text
Categorical

HIGH CARDINALITY
MISSING

Distinct1697
Distinct (%)0.4%
Missing18844
Missing (%)4.6%
Memory size3.1 MiB
65002 ZZ - LOCAL ORDINANCE VIOL (I) 65002
 
28446
602 PC - TRESPASSING (M) 32022
 
18215
647(E) PC - DIS CON:LODGE W/O CONSENT (M) 32111
 
16823
22450(A) VC - FAIL STOP VEH:XWALK/ETC (I) 54167
 
16209
65000 ZZ - LOCAL ORDINANCE VIOL (M) 65000
 
14655
Other values (1692)
294492 

Length

Max length56
Median length53
Mean length44.11001183
Min length24

Characters and Unicode

Total characters17151737
Distinct characters49
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique445 ?
Unique (%)0.1%

Sample

1st row27150(A) VC - INADEQUATE MUFFLERS (I) 54116
2nd row415(2) PC - LOUD/UNREASONABLE NOISE (I) 53130
3rd row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
4th row647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005
5th row602 PC - TRESPASSING (M) 32022

Common Values

ValueCountFrequency (%)
65002 ZZ - LOCAL ORDINANCE VIOL (I) 6500228446
 
7.0%
602 PC - TRESPASSING (M) 3202218215
 
4.5%
647(E) PC - DIS CON:LODGE W/O CONSENT (M) 3211116823
 
4.1%
22450(A) VC - FAIL STOP VEH:XWALK/ETC (I) 5416716209
 
4.0%
65000 ZZ - LOCAL ORDINANCE VIOL (M) 6500014655
 
3.6%
NA - XX ZZ - COMMUNITY CARETAKING (X) 9999014557
 
3.6%
22350 VC - UNSAFE SPEED:PREVAIL COND (I) 5410614147
 
3.5%
25620 BP - POSS OPEN ALCOHOL:PUBLIC (I) 410639946
 
2.4%
23123.5 VC - NO HND HLD DEVICE W/DRIVE (I) 546559532
 
2.3%
21461(A) VC - DRIVER FAIL OBEY SIGN/ETC (I) 541469512
 
2.3%
Other values (1687)236798
58.1%
(Missing)18844
 
4.6%

Length

2023-03-29T02:33:43.103717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
412682
 
12.9%
i218922
 
6.8%
vc183432
 
5.7%
m124213
 
3.9%
pc111743
 
3.5%
zz62689
 
2.0%
6500256892
 
1.8%
fail51302
 
1.6%
viol51009
 
1.6%
local43101
 
1.3%
Other values (5845)1894179
59.0%

Most occurring characters

ValueCountFrequency (%)
2821324
 
16.4%
I819904
 
4.8%
E777645
 
4.5%
C738813
 
4.3%
A689810
 
4.0%
(631649
 
3.7%
)631602
 
3.7%
O612034
 
3.6%
0601951
 
3.5%
2564425
 
3.3%
Other values (39)8262580
48.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter8716397
50.8%
Decimal Number3603590
21.0%
Space Separator2821324
 
16.4%
Open Punctuation631649
 
3.7%
Close Punctuation631602
 
3.7%
Dash Punctuation413250
 
2.4%
Other Punctuation332906
 
1.9%
Currency Symbol829
 
< 0.1%
Math Symbol190
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I819904
 
9.4%
E777645
 
8.9%
C738813
 
8.5%
A689810
 
7.9%
O612034
 
7.0%
N548839
 
6.3%
L545179
 
6.3%
T438047
 
5.0%
S427035
 
4.9%
P415949
 
4.8%
Other values (16)2703142
31.0%
Decimal Number
ValueCountFrequency (%)
0601951
16.7%
2564425
15.7%
5561441
15.6%
4463016
12.8%
1439987
12.2%
6334787
9.3%
3279634
7.8%
9172665
 
4.8%
7118941
 
3.3%
866743
 
1.9%
Other Punctuation
ValueCountFrequency (%)
/169870
51.0%
:132445
39.8%
.27167
 
8.2%
&3285
 
1.0%
'95
 
< 0.1%
"42
 
< 0.1%
,2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2821324
100.0%
Open Punctuation
ValueCountFrequency (%)
(631649
100.0%
Close Punctuation
ValueCountFrequency (%)
)631602
100.0%
Dash Punctuation
ValueCountFrequency (%)
-413250
100.0%
Currency Symbol
ValueCountFrequency (%)
$829
100.0%
Math Symbol
ValueCountFrequency (%)
+190
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8716397
50.8%
Common8435340
49.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
I819904
 
9.4%
E777645
 
8.9%
C738813
 
8.5%
A689810
 
7.9%
O612034
 
7.0%
N548839
 
6.3%
L545179
 
6.3%
T438047
 
5.0%
S427035
 
4.9%
P415949
 
4.8%
Other values (16)2703142
31.0%
Common
ValueCountFrequency (%)
2821324
33.4%
(631649
 
7.5%
)631602
 
7.5%
0601951
 
7.1%
2564425
 
6.7%
5561441
 
6.7%
4463016
 
5.5%
1439987
 
5.2%
-413250
 
4.9%
6334787
 
4.0%
Other values (13)971908
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII17151737
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2821324
 
16.4%
I819904
 
4.8%
E777645
 
4.5%
C738813
 
4.3%
A689810
 
4.0%
(631649
 
3.7%
)631602
 
3.7%
O612034
 
3.6%
0601951
 
3.5%
2564425
 
3.3%
Other values (39)8262580
48.2%

reason_detail
Categorical

HIGH CARDINALITY
MISSING

Distinct282
Distinct (%)0.1%
Missing18838
Missing (%)4.6%
Memory size3.1 MiB
Moving Violation
107984 
Officer witnessed commission of a crime
85183 
Matched suspect description
68932 
Equipment Violation
51210 
Other Reasonable Suspicion of a crime
36636 
Other values (277)
38901 

Length

Max length232
Median length210
Mean length29.89846623
Min length16

Characters and Unicode

Total characters11625899
Distinct characters46
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131 ?
Unique (%)< 0.1%

Sample

1st rowEquipment Violation
2nd rowOfficer witnessed commission of a crime
3rd rowOfficer witnessed commission of a crime
4th rowOfficer witnessed commission of a crime
5th rowMatched suspect description

Common Values

ValueCountFrequency (%)
Moving Violation107984
26.5%
Officer witnessed commission of a crime85183
20.9%
Matched suspect description68932
16.9%
Equipment Violation51210
12.6%
Other Reasonable Suspicion of a crime36636
 
9.0%
Non-moving Violation, including Registration Violation15867
 
3.9%
Witness or Victim identification of Suspect at the scene8664
 
2.1%
Matched suspect description|Witness or Victim identification of Suspect at the scene2714
 
0.7%
Matched suspect description|Officer witnessed commission of a crime2175
 
0.5%
Witness or Victim identification of Suspect at the scene|Matched suspect description1522
 
0.4%
Other values (272)7959
 
2.0%
(Missing)18838
 
4.6%

Length

2023-03-29T02:33:43.266477image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
violation190928
12.3%
of146809
 
9.5%
a130440
 
8.4%
crime126477
 
8.2%
moving107984
 
7.0%
suspect93364
 
6.0%
witnessed89822
 
5.8%
commission89822
 
5.8%
officer86627
 
5.6%
matched75197
 
4.9%
Other values (85)412092
26.6%

Most occurring characters

ValueCountFrequency (%)
i1468469
12.6%
1160716
 
10.0%
o1060373
 
9.1%
e913193
 
7.9%
n838158
 
7.2%
s755743
 
6.5%
t753249
 
6.5%
c671800
 
5.8%
a532953
 
4.6%
m392075
 
3.4%
Other values (36)3079170
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9702646
83.5%
Space Separator1160716
 
10.0%
Uppercase Letter717984
 
6.2%
Other Punctuation15873
 
0.1%
Dash Punctuation15871
 
0.1%
Math Symbol12785
 
0.1%
Decimal Number24
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i1468469
15.1%
o1060373
10.9%
e913193
9.4%
n838158
8.6%
s755743
 
7.8%
t753249
 
7.8%
c671800
 
6.9%
a532953
 
5.5%
m392075
 
4.0%
r373189
 
3.8%
Other values (15)1943444
20.0%
Uppercase Letter
ValueCountFrequency (%)
V205240
28.6%
M187036
26.1%
O129951
18.1%
R55291
 
7.7%
S54783
 
7.6%
E51210
 
7.1%
N15867
 
2.2%
W14312
 
2.0%
A3251
 
0.5%
C705
 
0.1%
Decimal Number
ValueCountFrequency (%)
08
33.3%
46
25.0%
84
16.7%
94
16.7%
72
 
8.3%
Other Punctuation
ValueCountFrequency (%)
,15869
> 99.9%
.4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1160716
100.0%
Dash Punctuation
ValueCountFrequency (%)
-15871
100.0%
Math Symbol
ValueCountFrequency (%)
|12785
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10420630
89.6%
Common1205269
 
10.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i1468469
14.1%
o1060373
 
10.2%
e913193
 
8.8%
n838158
 
8.0%
s755743
 
7.3%
t753249
 
7.2%
c671800
 
6.4%
a532953
 
5.1%
m392075
 
3.8%
r373189
 
3.6%
Other values (26)2661428
25.5%
Common
ValueCountFrequency (%)
1160716
96.3%
-15871
 
1.3%
,15869
 
1.3%
|12785
 
1.1%
08
 
< 0.1%
46
 
< 0.1%
84
 
< 0.1%
94
 
< 0.1%
.4
 
< 0.1%
72
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII11625899
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i1468469
12.6%
1160716
 
10.0%
o1060373
 
9.1%
e913193
 
7.9%
n838158
 
7.2%
s755743
 
6.5%
t753249
 
6.5%
c671800
 
5.8%
a532953
 
4.6%
m392075
 
3.4%
Other values (36)3079170
26.5%

reason_exp
Categorical

HIGH CARDINALITY

Distinct183583
Distinct (%)45.0%
Missing82
Missing (%)< 0.1%
Memory size3.1 MiB
cell phone
 
4819
stop sign
 
4721
speeding
 
4497
SPEED
 
4155
encroachment
 
3658
Other values (183578)
385752 

Length

Max length250
Median length235
Mean length28.53012988
Min length2

Characters and Unicode

Total characters11628938
Distinct characters92
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique155880 ?
Unique (%)38.2%

Sample

1st rowLOUD EXHAUST
2nd rowloud party
3rd rowstumbling back and forth, unable to maintain balance
4th rowfighting with security
5th rowrc of male at vacant house

Common Values

ValueCountFrequency (%)
cell phone4819
 
1.2%
stop sign4721
 
1.2%
speeding4497
 
1.1%
SPEED4155
 
1.0%
encroachment3658
 
0.9%
radio call3482
 
0.9%
speed3479
 
0.9%
STOP SIGN2446
 
0.6%
CELL PHONE2071
 
0.5%
ped stop2056
 
0.5%
Other values (183573)372218
91.3%

Length

2023-03-29T02:33:43.558990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
in54094
 
2.7%
subject51302
 
2.6%
of50374
 
2.6%
on43020
 
2.2%
a40364
 
2.1%
was37897
 
1.9%
to35570
 
1.8%
stop35343
 
1.8%
call29540
 
1.5%
and27506
 
1.4%
Other values (28030)1562251
79.4%

Most occurring characters

ValueCountFrequency (%)
1563291
 
13.4%
e750025
 
6.4%
i566936
 
4.9%
t537857
 
4.6%
a530621
 
4.6%
n527560
 
4.5%
o497686
 
4.3%
s430617
 
3.7%
r398458
 
3.4%
l366458
 
3.2%
Other values (82)5459429
46.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6810897
58.6%
Uppercase Letter2968066
25.5%
Space Separator1563291
 
13.4%
Decimal Number177838
 
1.5%
Other Punctuation96107
 
0.8%
Dash Punctuation6337
 
0.1%
Open Punctuation3051
 
< 0.1%
Close Punctuation3006
 
< 0.1%
Math Symbol273
 
< 0.1%
Currency Symbol62
 
< 0.1%
Other values (2)10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e750025
 
11.0%
i566936
 
8.3%
t537857
 
7.9%
a530621
 
7.8%
n527560
 
7.7%
o497686
 
7.3%
s430617
 
6.3%
r398458
 
5.9%
l366458
 
5.4%
d296457
 
4.4%
Other values (16)1908222
28.0%
Uppercase Letter
ValueCountFrequency (%)
E317034
 
10.7%
I242750
 
8.2%
N222256
 
7.5%
T221118
 
7.4%
S214555
 
7.2%
A214253
 
7.2%
O211596
 
7.1%
R174216
 
5.9%
L159276
 
5.4%
D140110
 
4.7%
Other values (16)850902
28.7%
Other Punctuation
ValueCountFrequency (%)
.63525
66.1%
,17132
 
17.8%
/10546
 
11.0%
'2515
 
2.6%
&907
 
0.9%
"635
 
0.7%
;263
 
0.3%
#170
 
0.2%
:156
 
0.2%
@89
 
0.1%
Other values (5)169
 
0.2%
Decimal Number
ValueCountFrequency (%)
543365
24.4%
138792
21.8%
030811
17.3%
419052
10.7%
214599
 
8.2%
69982
 
5.6%
36180
 
3.5%
75266
 
3.0%
85078
 
2.9%
94713
 
2.7%
Math Symbol
ValueCountFrequency (%)
+211
77.3%
>42
 
15.4%
=10
 
3.7%
<8
 
2.9%
~2
 
0.7%
Open Punctuation
ValueCountFrequency (%)
(3004
98.5%
[47
 
1.5%
Close Punctuation
ValueCountFrequency (%)
)2983
99.2%
]23
 
0.8%
Modifier Symbol
ValueCountFrequency (%)
`4
80.0%
^1
 
20.0%
Space Separator
ValueCountFrequency (%)
1563291
100.0%
Dash Punctuation
ValueCountFrequency (%)
-6337
100.0%
Currency Symbol
ValueCountFrequency (%)
$62
100.0%
Connector Punctuation
ValueCountFrequency (%)
_5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9778963
84.1%
Common1849975
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e750025
 
7.7%
i566936
 
5.8%
t537857
 
5.5%
a530621
 
5.4%
n527560
 
5.4%
o497686
 
5.1%
s430617
 
4.4%
r398458
 
4.1%
l366458
 
3.7%
E317034
 
3.2%
Other values (42)4855711
49.7%
Common
ValueCountFrequency (%)
1563291
84.5%
.63525
 
3.4%
543365
 
2.3%
138792
 
2.1%
030811
 
1.7%
419052
 
1.0%
,17132
 
0.9%
214599
 
0.8%
/10546
 
0.6%
69982
 
0.5%
Other values (30)38880
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII11628938
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1563291
 
13.4%
e750025
 
6.4%
i566936
 
4.9%
t537857
 
4.6%
a530621
 
4.6%
n527560
 
4.5%
o497686
 
4.3%
s430617
 
3.7%
r398458
 
3.4%
l366458
 
3.2%
Other values (82)5459429
46.9%

search_basis
Categorical

HIGH CARDINALITY
MISSING

Distinct721
Distinct (%)0.8%
Missing321160
Missing (%)78.8%
Memory size3.1 MiB
Incident to arrest
39048 
Condition of parole / probation/ PRCS / mandatory supervision
23248 
Consent given
5589 
Officer Safety/safety of others
3961 
Vehicle inventory
 
1963
Other values (716)
12715 

Length

Max length182
Median length174
Mean length34.48750636
Min length13

Characters and Unicode

Total characters2983997
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique396 ?
Unique (%)0.5%

Sample

1st rowVehicle inventory
2nd rowIncident to arrest
3rd rowIncident to arrest
4th rowIncident to arrest
5th rowIncident to arrest

Common Values

ValueCountFrequency (%)
Incident to arrest39048
 
9.6%
Condition of parole / probation/ PRCS / mandatory supervision23248
 
5.7%
Consent given5589
 
1.4%
Officer Safety/safety of others3961
 
1.0%
Vehicle inventory1963
 
0.5%
Condition of parole / probation/ PRCS / mandatory supervision|Incident to arrest1034
 
0.3%
Visible contraband938
 
0.2%
Incident to arrest|Officer Safety/safety of others911
 
0.2%
Incident to arrest|Condition of parole / probation/ PRCS / mandatory supervision715
 
0.2%
Consent given|Incident to arrest564
 
0.1%
Other values (711)8553
 
2.1%
(Missing)321160
78.8%

Length

2023-03-29T02:33:43.742990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
53566
12.3%
to45600
10.5%
arrest42100
9.7%
incident42093
9.7%
of36452
8.4%
probation26783
 
6.2%
prcs26783
 
6.2%
mandatory26783
 
6.2%
parole26783
 
6.2%
condition25187
 
5.8%
Other values (128)83147
19.1%

Most occurring characters

ValueCountFrequency (%)
348753
11.7%
o294376
 
9.9%
n268287
 
9.0%
t257125
 
8.6%
r222991
 
7.5%
e214760
 
7.2%
i209486
 
7.0%
a176466
 
5.9%
s129107
 
4.3%
d105784
 
3.5%
Other values (24)756862
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2321650
77.8%
Space Separator348753
 
11.7%
Uppercase Letter213531
 
7.2%
Other Punctuation88097
 
3.0%
Math Symbol11966
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o294376
12.7%
n268287
11.6%
t257125
11.1%
r222991
9.6%
e214760
9.3%
i209486
9.0%
a176466
7.6%
s129107
 
5.6%
d105784
 
4.6%
p83631
 
3.6%
Other values (12)359637
15.5%
Uppercase Letter
ValueCountFrequency (%)
C62715
29.4%
I45600
21.4%
S36333
17.0%
R26783
12.5%
P26783
12.5%
O8261
 
3.9%
V4990
 
2.3%
E1652
 
0.8%
W414
 
0.2%
Space Separator
ValueCountFrequency (%)
348753
100.0%
Other Punctuation
ValueCountFrequency (%)
/88097
100.0%
Math Symbol
ValueCountFrequency (%)
|11966
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2535181
85.0%
Common448816
 
15.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o294376
11.6%
n268287
10.6%
t257125
10.1%
r222991
 
8.8%
e214760
 
8.5%
i209486
 
8.3%
a176466
 
7.0%
s129107
 
5.1%
d105784
 
4.2%
p83631
 
3.3%
Other values (21)573168
22.6%
Common
ValueCountFrequency (%)
348753
77.7%
/88097
 
19.6%
|11966
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2983997
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
348753
11.7%
o294376
 
9.9%
n268287
 
9.0%
t257125
 
8.6%
r222991
 
7.5%
e214760
 
7.2%
i209486
 
7.0%
a176466
 
5.9%
s129107
 
4.3%
d105784
 
3.5%
Other values (24)756862
25.4%

search_basis_exp
Categorical

HIGH CARDINALITY
MISSING

Distinct28990
Distinct (%)45.7%
Missing344258
Missing (%)84.4%
Memory size3.1 MiB
incident to arrest
 
2366
search incident to arrest
 
1541
arrest
 
1321
arrested
 
800
Incident to arrest
 
772
Other values (28985)
56626 

Length

Max length250
Median length236
Mean length27.76601078
Min length3

Characters and Unicode

Total characters1761087
Distinct characters88
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24904 ?
Unique (%)39.3%

Sample

1st rowIMPOUNDED
2nd rowsearch incident to arrest
3rd rowsearch incident to arrest
4th rowMale drunk in public
5th row273.6 viloation of TRO

Common Values

ValueCountFrequency (%)
incident to arrest2366
 
0.6%
search incident to arrest1541
 
0.4%
arrest1321
 
0.3%
arrested800
 
0.2%
Incident to arrest772
 
0.2%
INCIDENT TO ARREST652
 
0.2%
searched incident to arrest572
 
0.1%
consent550
 
0.1%
5150 hold516
 
0.1%
consent search465
 
0.1%
Other values (28980)53871
 
13.2%
(Missing)344258
84.4%

Length

2023-03-29T02:33:43.943023image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
to20323
 
7.0%
arrest20146
 
7.0%
for13693
 
4.7%
incident12131
 
4.2%
search9435
 
3.3%
arrested9119
 
3.2%
subject8539
 
3.0%
was8139
 
2.8%
searched6252
 
2.2%
and5808
 
2.0%
Other values (7320)175627
60.7%

Most occurring characters

ValueCountFrequency (%)
226198
 
12.8%
e138232
 
7.8%
r116668
 
6.6%
t104510
 
5.9%
a100275
 
5.7%
n85671
 
4.9%
s78939
 
4.5%
o76673
 
4.4%
i63628
 
3.6%
c59719
 
3.4%
Other values (78)710574
40.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1099455
62.4%
Uppercase Letter359554
 
20.4%
Space Separator226198
 
12.8%
Decimal Number54950
 
3.1%
Other Punctuation14716
 
0.8%
Close Punctuation2745
 
0.2%
Open Punctuation2743
 
0.2%
Dash Punctuation699
 
< 0.1%
Math Symbol14
 
< 0.1%
Currency Symbol10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e138232
12.6%
r116668
10.6%
t104510
9.5%
a100275
9.1%
n85671
 
7.8%
s78939
 
7.2%
o76673
 
7.0%
i63628
 
5.8%
c59719
 
5.4%
d58870
 
5.4%
Other values (16)216270
19.7%
Uppercase Letter
ValueCountFrequency (%)
E39943
11.1%
R33124
9.2%
A32432
9.0%
T31278
 
8.7%
S30653
 
8.5%
N26150
 
7.3%
O23930
 
6.7%
I22840
 
6.4%
C20263
 
5.6%
D19228
 
5.3%
Other values (16)79713
22.2%
Other Punctuation
ValueCountFrequency (%)
.9951
67.6%
,2588
 
17.6%
/1104
 
7.5%
&597
 
4.1%
'293
 
2.0%
:62
 
0.4%
;53
 
0.4%
"40
 
0.3%
#15
 
0.1%
?4
 
< 0.1%
Other values (4)9
 
0.1%
Decimal Number
ValueCountFrequency (%)
514267
26.0%
112107
22.0%
08044
14.6%
45518
 
10.0%
24591
 
8.4%
63142
 
5.7%
72588
 
4.7%
32167
 
3.9%
91394
 
2.5%
81132
 
2.1%
Math Symbol
ValueCountFrequency (%)
+6
42.9%
=5
35.7%
>2
 
14.3%
<1
 
7.1%
Close Punctuation
ValueCountFrequency (%)
)2744
> 99.9%
]1
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
^2
66.7%
`1
33.3%
Space Separator
ValueCountFrequency (%)
226198
100.0%
Open Punctuation
ValueCountFrequency (%)
(2743
100.0%
Dash Punctuation
ValueCountFrequency (%)
-699
100.0%
Currency Symbol
ValueCountFrequency (%)
$10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1459009
82.8%
Common302078
 
17.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e138232
 
9.5%
r116668
 
8.0%
t104510
 
7.2%
a100275
 
6.9%
n85671
 
5.9%
s78939
 
5.4%
o76673
 
5.3%
i63628
 
4.4%
c59719
 
4.1%
d58870
 
4.0%
Other values (42)575824
39.5%
Common
ValueCountFrequency (%)
226198
74.9%
514267
 
4.7%
112107
 
4.0%
.9951
 
3.3%
08044
 
2.7%
45518
 
1.8%
24591
 
1.5%
63142
 
1.0%
)2744
 
0.9%
(2743
 
0.9%
Other values (26)12773
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1761087
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
226198
 
12.8%
e138232
 
7.8%
r116668
 
6.6%
t104510
 
5.9%
a100275
 
5.7%
n85671
 
4.9%
s78939
 
4.5%
o76673
 
4.4%
i63628
 
3.6%
c59719
 
3.4%
Other values (78)710574
40.3%

seiz_basis
Categorical

HIGH CORRELATION
MISSING

Distinct49
Distinct (%)0.5%
Missing398568
Missing (%)97.8%
Memory size3.1 MiB
Evidence
2999 
Contraband
2090 
Impound of vehicle
1217 
Contraband|Evidence
1049 
Evidence|Contraband
551 
Other values (44)
1210 

Length

Max length76
Median length65
Mean length15.30759105
Min length8

Characters and Unicode

Total characters139544
Distinct characters30
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.1%

Sample

1st rowContraband
2nd rowEvidence|Impound of vehicle
3rd rowContraband|Evidence
4th rowImpound of vehicle
5th rowEvidence

Common Values

ValueCountFrequency (%)
Evidence2999
 
0.7%
Contraband2090
 
0.5%
Impound of vehicle1217
 
0.3%
Contraband|Evidence1049
 
0.3%
Evidence|Contraband551
 
0.1%
Safekeeping as allowed by law/statute387
 
0.1%
Evidence|Impound of vehicle237
 
0.1%
Contraband|Evidence|Impound of vehicle106
 
< 0.1%
Abandoned property76
 
< 0.1%
Contraband|Impound of vehicle57
 
< 0.1%
Other values (39)347
 
0.1%
(Missing)398568
97.8%

Length

2023-03-29T02:33:44.094990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
evidence2999
19.9%
contraband2090
13.9%
of1787
11.9%
vehicle1662
11.0%
impound1307
8.7%
contraband|evidence1049
 
7.0%
as562
 
3.7%
allowed562
 
3.7%
by562
 
3.7%
evidence|contraband551
 
3.7%
Other values (38)1913
12.7%

Most occurring characters

ValueCountFrequency (%)
e17040
12.2%
n15843
 
11.4%
d11812
 
8.5%
a10977
 
7.9%
o8379
 
6.0%
i7575
 
5.4%
c7013
 
5.0%
v7011
 
5.0%
5928
 
4.2%
t5823
 
4.2%
Other values (20)42143
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter118754
85.1%
Uppercase Letter11708
 
8.4%
Space Separator5928
 
4.2%
Math Symbol2592
 
1.9%
Other Punctuation562
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e17040
14.3%
n15843
13.3%
d11812
9.9%
a10977
9.2%
o8379
 
7.1%
i7575
 
6.4%
c7013
 
5.9%
v7011
 
5.9%
t5823
 
4.9%
b4697
 
4.0%
Other values (12)22584
19.0%
Uppercase Letter
ValueCountFrequency (%)
E5224
44.6%
C4031
34.4%
I1786
 
15.3%
S563
 
4.8%
A104
 
0.9%
Space Separator
ValueCountFrequency (%)
5928
100.0%
Math Symbol
ValueCountFrequency (%)
|2592
100.0%
Other Punctuation
ValueCountFrequency (%)
/562
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin130462
93.5%
Common9082
 
6.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e17040
13.1%
n15843
12.1%
d11812
 
9.1%
a10977
 
8.4%
o8379
 
6.4%
i7575
 
5.8%
c7013
 
5.4%
v7011
 
5.4%
t5823
 
4.5%
E5224
 
4.0%
Other values (17)33765
25.9%
Common
ValueCountFrequency (%)
5928
65.3%
|2592
28.5%
/562
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII139544
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e17040
12.2%
n15843
 
11.4%
d11812
 
8.5%
a10977
 
7.9%
o8379
 
6.0%
i7575
 
5.4%
c7013
 
5.0%
v7011
 
5.0%
5928
 
4.2%
t5823
 
4.2%
Other values (20)42143
30.2%

prop_type
Categorical

HIGH CARDINALITY
MISSING

Distinct490
Distinct (%)5.4%
Missing398568
Missing (%)97.8%
Memory size3.1 MiB
Drugs/narcotics
1551 
Vehicle
1194 
Drug Paraphernalia
1115 
Drugs/narcotics|Drug Paraphernalia
794 
Other Contraband or evidence
586 
Other values (485)
3876 

Length

Max length202
Median length151
Mean length27.32305836
Min length5

Characters and Unicode

Total characters249077
Distinct characters35
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique259 ?
Unique (%)2.8%

Sample

1st rowDrugs/narcotics
2nd rowAlcohol
3rd rowDrugs/narcotics|Money|Drug Paraphernalia
4th rowVehicle
5th rowFirearm(s)|Ammunition|Cell phone(s) or electronic device(s)

Common Values

ValueCountFrequency (%)
Drugs/narcotics1551
 
0.4%
Vehicle1194
 
0.3%
Drug Paraphernalia1115
 
0.3%
Drugs/narcotics|Drug Paraphernalia794
 
0.2%
Other Contraband or evidence586
 
0.1%
Weapon(s) other than a firearm509
 
0.1%
Alcohol483
 
0.1%
Drug Paraphernalia|Drugs/narcotics236
 
0.1%
Cell phone(s) or electronic device(s)201
 
< 0.1%
Suspected Stolen property176
 
< 0.1%
Other values (480)2271
 
0.6%
(Missing)398568
97.8%

Length

2023-03-29T02:33:44.278988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
paraphernalia2089
 
8.8%
or2046
 
8.6%
drug1570
 
6.6%
drugs/narcotics1551
 
6.5%
other1518
 
6.4%
vehicle1194
 
5.0%
contraband1171
 
4.9%
drugs/narcotics|drug1057
 
4.4%
evidence1054
 
4.4%
a876
 
3.7%
Other values (233)9672
40.6%

Most occurring characters

ValueCountFrequency (%)
r26702
 
10.7%
a22214
 
8.9%
e22150
 
8.9%
n15726
 
6.3%
14682
 
5.9%
c14101
 
5.7%
o13723
 
5.5%
i13625
 
5.5%
s11269
 
4.5%
t10895
 
4.4%
Other values (25)83990
33.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter200742
80.6%
Uppercase Letter18711
 
7.5%
Space Separator14682
 
5.9%
Math Symbol4834
 
1.9%
Other Punctuation3760
 
1.5%
Close Punctuation3174
 
1.3%
Open Punctuation3174
 
1.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r26702
13.3%
a22214
11.1%
e22150
11.0%
n15726
 
7.8%
c14101
 
7.0%
o13723
 
6.8%
i13625
 
6.8%
s11269
 
5.6%
t10895
 
5.4%
h9023
 
4.5%
Other values (10)41314
20.6%
Uppercase Letter
ValueCountFrequency (%)
D6775
36.2%
P3015
16.1%
C2046
 
10.9%
V1623
 
8.7%
O1171
 
6.3%
S1150
 
6.1%
A1028
 
5.5%
W876
 
4.7%
F548
 
2.9%
M479
 
2.6%
Space Separator
ValueCountFrequency (%)
14682
100.0%
Math Symbol
ValueCountFrequency (%)
|4834
100.0%
Other Punctuation
ValueCountFrequency (%)
/3760
100.0%
Close Punctuation
ValueCountFrequency (%)
)3174
100.0%
Open Punctuation
ValueCountFrequency (%)
(3174
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin219453
88.1%
Common29624
 
11.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r26702
12.2%
a22214
 
10.1%
e22150
 
10.1%
n15726
 
7.2%
c14101
 
6.4%
o13723
 
6.3%
i13625
 
6.2%
s11269
 
5.1%
t10895
 
5.0%
h9023
 
4.1%
Other values (20)60025
27.4%
Common
ValueCountFrequency (%)
14682
49.6%
|4834
 
16.3%
/3760
 
12.7%
)3174
 
10.7%
(3174
 
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII249077
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r26702
 
10.7%
a22214
 
8.9%
e22150
 
8.9%
n15726
 
6.3%
14682
 
5.9%
c14101
 
5.7%
o13723
 
5.5%
i13625
 
5.5%
s11269
 
4.5%
t10895
 
4.4%
Other values (25)83990
33.7%

cont
Categorical

HIGH CARDINALITY

Distinct669
Distinct (%)0.2%
Missing5
Missing (%)< 0.1%
Memory size3.1 MiB
None
369726 
Alcohol
 
11476
Drugs/narcotics
 
6526
Drug Paraphernalia
 
4943
Drugs/narcotics|Drug Paraphernalia
 
2538
Other values (664)
 
12470

Length

Max length194
Median length4
Mean length5.624027728
Min length4

Characters and Unicode

Total characters2292798
Distinct characters35
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique322 ?
Unique (%)0.1%

Sample

1st rowNone
2nd rowNone
3rd rowNone
4th rowNone
5th rowNone

Common Values

ValueCountFrequency (%)
None369726
90.7%
Alcohol11476
 
2.8%
Drugs/narcotics6526
 
1.6%
Drug Paraphernalia4943
 
1.2%
Drugs/narcotics|Drug Paraphernalia2538
 
0.6%
Weapon(s) other than a firearm2222
 
0.5%
Other Contraband or evidence2068
 
0.5%
Drug Paraphernalia|Drugs/narcotics947
 
0.2%
Suspected Stolen property731
 
0.2%
Firearm(s)665
 
0.2%
Other values (659)5837
 
1.4%

Length

2023-03-29T02:33:44.478987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none369726
81.8%
alcohol11476
 
2.5%
paraphernalia8064
 
1.8%
drugs/narcotics6526
 
1.4%
drug6396
 
1.4%
other5435
 
1.2%
or5160
 
1.1%
than3255
 
0.7%
a3255
 
0.7%
contraband3241
 
0.7%
Other values (301)29237
 
6.5%

Most occurring characters

ValueCountFrequency (%)
o431619
18.8%
e424236
18.5%
n418639
18.3%
N369726
16.1%
r86923
 
3.8%
a75823
 
3.3%
c48344
 
2.1%
44092
 
1.9%
l42132
 
1.8%
i37630
 
1.6%
Other values (25)313634
13.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1771285
77.3%
Uppercase Letter434995
 
19.0%
Space Separator44092
 
1.9%
Other Punctuation12790
 
0.6%
Math Symbol12012
 
0.5%
Open Punctuation8812
 
0.4%
Close Punctuation8812
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o431619
24.4%
e424236
24.0%
n418639
23.6%
r86923
 
4.9%
a75823
 
4.3%
c48344
 
2.7%
l42132
 
2.4%
i37630
 
2.1%
s36002
 
2.0%
h34279
 
1.9%
Other values (10)135658
 
7.7%
Uppercase Letter
ValueCountFrequency (%)
N369726
85.0%
D23243
 
5.3%
A13323
 
3.1%
P10453
 
2.4%
C5160
 
1.2%
W3255
 
0.7%
O3241
 
0.7%
S3220
 
0.7%
F1719
 
0.4%
M1655
 
0.4%
Space Separator
ValueCountFrequency (%)
44092
100.0%
Other Punctuation
ValueCountFrequency (%)
/12790
100.0%
Math Symbol
ValueCountFrequency (%)
|12012
100.0%
Open Punctuation
ValueCountFrequency (%)
(8812
100.0%
Close Punctuation
ValueCountFrequency (%)
)8812
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2206280
96.2%
Common86518
 
3.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
o431619
19.6%
e424236
19.2%
n418639
19.0%
N369726
16.8%
r86923
 
3.9%
a75823
 
3.4%
c48344
 
2.2%
l42132
 
1.9%
i37630
 
1.7%
s36002
 
1.6%
Other values (20)235206
10.7%
Common
ValueCountFrequency (%)
44092
51.0%
/12790
 
14.8%
|12012
 
13.9%
(8812
 
10.2%
)8812
 
10.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII2292798
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o431619
18.8%
e424236
18.5%
n418639
18.3%
N369726
16.1%
r86923
 
3.8%
a75823
 
3.3%
c48344
 
2.1%
44092
 
1.9%
l42132
 
1.8%
i37630
 
1.6%
Other values (25)313634
13.7%

actions
Categorical

HIGH CARDINALITY

Distinct11672
Distinct (%)2.9%
Missing5
Missing (%)< 0.1%
Memory size3.1 MiB
None
246838 
Curbside detention
 
24321
Handcuffed or flex cuffed
 
16308
Search of person was conducted|Handcuffed or flex cuffed
 
9208
Handcuffed or flex cuffed|Search of person was conducted
 
9022
Other values (11667)
101982 

Length

Max length360
Median length4
Mean length28.12013619
Min length4

Characters and Unicode

Total characters11463989
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8059 ?
Unique (%)2.0%

Sample

1st rowSearch of property was conducted|Vehicle impounded
2nd rowCurbside detention
3rd rowPatrol car detention|Handcuffed or flex cuffed|Search of person was conducted
4th rowCurbside detention|Handcuffed or flex cuffed|Search of person was conducted
5th rowNone

Common Values

ValueCountFrequency (%)
None246838
60.5%
Curbside detention24321
 
6.0%
Handcuffed or flex cuffed16308
 
4.0%
Search of person was conducted|Handcuffed or flex cuffed9208
 
2.3%
Handcuffed or flex cuffed|Search of person was conducted9022
 
2.2%
Patrol car detention|Handcuffed or flex cuffed3113
 
0.8%
Curbside detention|Handcuffed or flex cuffed2976
 
0.7%
Patrol car detention|Search of person was conducted|Handcuffed or flex cuffed2743
 
0.7%
Person photographed2531
 
0.6%
Search of person was conducted|Handcuffed or flex cuffed|Search of property was conducted2520
 
0.6%
Other values (11662)88099
 
21.6%

Length

2023-03-29T02:33:44.678990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
none246838
15.5%
was125241
 
7.8%
or117174
 
7.3%
of116125
 
7.3%
flex111013
 
6.9%
person90179
 
5.6%
cuffed51222
 
3.2%
conducted47480
 
3.0%
search45074
 
2.8%
handcuffed44529
 
2.8%
Other values (233)602459
37.7%

Most occurring characters

ValueCountFrequency (%)
e1462183
12.8%
1189655
 
10.4%
o1051750
 
9.2%
n837630
 
7.3%
d819897
 
7.2%
r725015
 
6.3%
f707926
 
6.2%
c705365
 
6.2%
t485449
 
4.2%
a474788
 
4.1%
Other values (28)3004331
26.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9391409
81.9%
Space Separator1189655
 
10.4%
Uppercase Letter648121
 
5.7%
Math Symbol234804
 
2.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1462183
15.6%
o1051750
11.2%
n837630
8.9%
d819897
8.7%
r725015
7.7%
f707926
7.5%
c705365
7.5%
t485449
 
5.2%
a474788
 
5.1%
u406584
 
4.3%
Other values (15)1714822
18.3%
Uppercase Letter
ValueCountFrequency (%)
N246838
38.1%
S116125
17.9%
H111013
17.1%
P81745
 
12.6%
C58772
 
9.1%
A16520
 
2.5%
V10181
 
1.6%
F6635
 
1.0%
E190
 
< 0.1%
I55
 
< 0.1%
Space Separator
ValueCountFrequency (%)
1189655
100.0%
Math Symbol
ValueCountFrequency (%)
|234804
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin10039530
87.6%
Common1424459
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1462183
14.6%
o1051750
10.5%
n837630
 
8.3%
d819897
 
8.2%
r725015
 
7.2%
f707926
 
7.1%
c705365
 
7.0%
t485449
 
4.8%
a474788
 
4.7%
u406584
 
4.0%
Other values (26)2362943
23.5%
Common
ValueCountFrequency (%)
1189655
83.5%
|234804
 
16.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII11463989
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1462183
12.8%
1189655
 
10.4%
o1051750
 
9.2%
n837630
 
7.3%
d819897
 
7.2%
r725015
 
6.3%
f707926
 
6.2%
c705365
 
6.2%
t485449
 
4.2%
a474788
 
4.1%
Other values (28)3004331
26.2%

act_consent
Categorical

HIGH CARDINALITY
MISSING

Distinct335
Distinct (%)0.3%
Missing297641
Missing (%)73.0%
Memory size3.1 MiB
NA|NA
38272 
NA|NA|NA
29881 
NA|NA|NA|NA
17701 
NA|NA|NA|NA|NA
7547 
NA|NA|NA|NA|NA|NA
 
2403
Other values (330)
14239 

Length

Max length35
Median length34
Mean length8.251428987
Min length1

Characters and Unicode

Total characters908012
Distinct characters4
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique95 ?
Unique (%)0.1%

Sample

1st rowNA|NA
2nd rowNA|NA|NA
3rd rowNA|NA|NA
4th rowNA|NA
5th rowNA|NA|NA

Common Values

ValueCountFrequency (%)
NA|NA38272
 
9.4%
NA|NA|NA29881
 
7.3%
NA|NA|NA|NA17701
 
4.3%
NA|NA|NA|NA|NA7547
 
1.9%
NA|NA|NA|NA|NA|NA2403
 
0.6%
Y|NA1814
 
0.4%
NA|Y1112
 
0.3%
Y|NA|NA1095
 
0.3%
NA|Y|NA892
 
0.2%
NA|NA|NA|NA|NA|NA|NA827
 
0.2%
Other values (325)8499
 
2.1%
(Missing)297641
73.0%

Length

2023-03-29T02:33:44.862990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na|na38272
34.8%
na|na|na29881
27.2%
na|na|na|na17701
16.1%
na|na|na|na|na7547
 
6.9%
na|na|na|na|na|na2403
 
2.2%
y|na1814
 
1.6%
na|y1112
 
1.0%
y|na|na1095
 
1.0%
na|y|na892
 
0.8%
na|na|na|na|na|na|na827
 
0.8%
Other values (325)8499
 
7.7%

Most occurring characters

ValueCountFrequency (%)
N330813
36.4%
A328361
36.2%
|234804
25.9%
Y14034
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter673208
74.1%
Math Symbol234804
 
25.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N330813
49.1%
A328361
48.8%
Y14034
 
2.1%
Math Symbol
ValueCountFrequency (%)
|234804
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin673208
74.1%
Common234804
 
25.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
N330813
49.1%
A328361
48.8%
Y14034
 
2.1%
Common
ValueCountFrequency (%)
|234804
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII908012
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N330813
36.4%
A328361
36.2%
|234804
25.9%
Y14034
 
1.5%

Interactions

2023-03-29T02:33:13.810561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:43.184545image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:46.722217image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:50.804699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:54.145472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:56.682153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:59.570410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:02.562422image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:05.106379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:07.850402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:10.922564image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:14.074559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:43.560812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:47.011201image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:51.108699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:54.385844image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:56.946153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:59.826420image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:02.786638image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:05.362414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:08.122401image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:11.194530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:14.338559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:43.865384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:47.315570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:51.404669image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:54.642155image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:57.202154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:00.074416image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:03.010437image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:05.634413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:08.506406image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:11.474561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:14.826566image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:44.209958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:47.655755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:51.693098image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:54.866155image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:57.458153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:00.338383image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:03.250381image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:05.890378image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:08.794407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:11.738528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:15.074560image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:44.538447image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:47.970902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:52.005340image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:55.090157image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:57.698159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:00.778416image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:03.482382image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:06.138411image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:09.122410image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:12.002560image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:15.322561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:44.890759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:49.011961image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:52.381579image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:55.314155image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:57.938154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:01.018414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:03.722412image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:06.386413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:09.370564image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:12.258559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:15.562525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:45.163025image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:49.276035image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:52.837675image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:55.546154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:58.170418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:01.282384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:03.970379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:06.650377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:09.618570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:12.506562image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:15.786524image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:45.443246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:49.516593image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:53.117917image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:55.746122image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:58.466386image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:01.514421image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:04.186380image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:06.882380image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:09.842562image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:12.746526image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:16.034556image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:45.747244image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:49.789407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:53.398141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:55.970156image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:58.730418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:01.802413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:04.418379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:07.122377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:10.098541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:13.010559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:16.290557image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:46.027879image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:50.053543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:53.654737image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:56.186152image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:59.026422image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:02.058413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:04.642379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:07.354377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:10.362563image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:13.274559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:16.546525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:46.441720image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:50.516666image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:53.913173image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:56.426136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:32:59.322384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:02.338417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:04.874378image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:07.594407image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:10.642561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2023-03-29T02:33:13.538527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2023-03-29T02:33:45.031022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-29T02:33:45.278986image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-29T02:33:45.503018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-29T02:33:45.727020image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-29T02:33:17.370723image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-29T02:33:22.578901image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-03-29T02:33:30.078217image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2023-03-29T02:33:31.294214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexUnnamed: 0stop_idpididoriagencyexp_yearsdatetimeduris_servassign_keyassign_wordsintersblockldmkstreethw_exitis_schoolschool_namecitybeatbeat_nameis_studentlim_engagegender_wordsis_gendncgender_codegendnc_codelgbtracedisabilityreason_wordsreasonidreason_textreason_detailreason_expsearch_basissearch_basis_expseiz_basisprop_typecontactionsact_consent
00184362184362_1CA0371100SD102019-01-0100:15:073001Patrol, traffic enforcement, field operationsNaN3500.0NaNUNIVERSITYNaN0NaNSAN DIEGO839Cherokee Point 8390130Male01NaNNohispNoneTraffic Violation54116.027150(A) VC - INADEQUATE MUFFLERS (I) 54116Equipment ViolationLOUD EXHAUSTVehicle inventoryIMPOUNDEDNaNNaNNoneSearch of property was conducted|Vehicle impoundedNA|NA
11284364184364_1CA0371100SD22019-01-0100:15:161001Patrol, traffic enforcement, field operationsNaN7500.0NaNhillside drNaN0NaNLA JOLLA124La Jolla 1240044Female02NaNNowhiteNoneReasonable Suspicion53130.0415(2) PC - LOUD/UNREASONABLE NOISE (I) 53130Officer witnessed commission of a crimeloud partyNaNNaNNaNNaNNoneCurbside detentionNaN
22384365184365_1CA0371100SD12019-01-0100:02:00501Patrol, traffic enforcement, field operationsNaN1300.0NaNocean blvdNaN0NaNSAN DIEGO122Pacific Beach 1220030Female02NaNNowhiteNoneReasonable Suspicion64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Officer witnessed commission of a crimestumbling back and forth, unable to maintain balanceIncident to arrestsearch incident to arrestNaNNaNNonePatrol car detention|Handcuffed or flex cuffed|Search of person was conductedNA|NA|NA
33484366184366_1CA0371100SD12019-01-0100:38:00501Patrol, traffic enforcement, field operationsNaN800.0NaNgarnetNaN0NaNSAN DIEGO122Pacific Beach 1220025Male01NaNNohispNoneReasonable Suspicion64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Officer witnessed commission of a crimefighting with securityIncident to arrestsearch incident to arrestNaNNaNNoneCurbside detention|Handcuffed or flex cuffed|Search of person was conductedNA|NA|NA
44584369184369_1CA0371100SD172019-01-0101:06:41211Patrol, traffic enforcement, field operationsNaN4400.0NaNcoronadoNaN0NaNSAN DIEGO614Ocean Beach 6140140Male01NaNNoblackNoneReasonable Suspicion32022.0602 PC - TRESPASSING (M) 32022Matched suspect descriptionrc of male at vacant houseNaNNaNNaNNaNNoneNoneNaN
55684370184370_1CA0371100SD12019-01-0101:11:05501Patrol, traffic enforcement, field operationsgovernor drNaNNaNradcliffeNaN0NaNSAN DIEGO115University City 1150075Female02NaNNowhiteNoneTraffic Violation54110.024601 VC - FAIL MAINT LIC PLATE LAMP (I) 54110Equipment Violationno license plate lightsNaNNaNNaNNaNNoneNoneNaN
66784371184371_1CA0371100SD12019-01-0101:15:566001Patrol, traffic enforcement, field operationsla jolla village drNaNNaNvilla la jolla drNaN0NaNSAN DIEGO126Torrey Pines 1260045Male01NaNNowhiteNoneTraffic Violation54056.020002 VC - HIT AND RUN (M) 54056Moving Violationdrive hit victim vehicle causing damage and minor injury and fled on footNaNNaNNaNNaNNoneNoneNaN
77884372184372_1CA0371100SD22019-01-0101:10:541001Patrol, traffic enforcement, field operationsNaN1000.0NaNpacific beach drNaN0NaNSAN DIEGO122Pacific Beach 1220025Male01NaNNohispNoneReasonable Suspicion64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Officer witnessed commission of a crimefell in streetNaNNaNNaNNaNNoneCurbside detentionNaN
88984372284372_2CA0371100SD22019-01-0101:10:541001Patrol, traffic enforcement, field operationsNaN1000.0NaNpacific beach drNaN0NaNSAN DIEGO122Pacific Beach 1220023Female02NaNNohispNoneReasonable Suspicion64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Officer witnessed commission of a crimefell in streetNaNNaNNaNNaNNoneCurbside detentionNaN
991084373184373_1CA0371100SD92019-01-0101:10:52501Patrol, traffic enforcement, field operationsNaN300.0NaN5th AvNaN0NaNSAN DIEGO523Gaslamp 5230021Male01NaNNohispNoneReasonable Suspicion64005.0647(F) PC - DISORD CONDUCT:ALCOHOL (M) 64005Officer witnessed commission of a crimeMale drunk in public unable to care for himselfIncident to arrestMale drunk in publicNaNNaNNoneHandcuffed or flex cuffed|Search of person was conductedNA|NA

Last rows

df_indexUnnamed: 0stop_idpididoriagencyexp_yearsdatetimeduris_servassign_keyassign_wordsintersblockldmkstreethw_exitis_schoolschool_namecitybeatbeat_nameis_studentlim_engagegender_wordsis_gendncgender_codegendnc_codelgbtracedisabilityreason_wordsreasonidreason_textreason_detailreason_expsearch_basissearch_basis_expseiz_basisprop_typecontactionsact_consent
40767469804698054496873449687_3CA0371100SD12021-06-3015:35:006001Patrol, traffic enforcement, field operationsNaN100.0NaNW San Ysidro blvdNaN0NaNSAN YSIDRO712San Ysidro 7120060Male01NaNNowhiteNoneTraffic Violation54431.024951(B) VC - TURN SIGNAL VIOLATION (I) 54431Moving Violationpulled over vehicle for not using turn signal for a lane change and for the 3rd brakelight being out.NaNNaNNaNNaNNonePerson removed from vehicle by orderNaN
40767569805698064496921449692_1CA0371100SD12021-06-3022:46:002001Patrol, traffic enforcement, field operationsNaN4300.0NaNUniversityNaN0NaNSAN DIEGO832Teralta West 8320018Male01NaNNohispNoneTraffic Violation54168.05204(A) VC - EXPIRED TABS/FAIL DISPLAY (I) 54168Non-moving Violation, including Registration Violationdisplayed reg expired over 6 monthsNaNNaNNaNNaNNoneNoneNaN
40767669806698074496931449693_1CA0371100SD12021-06-3015:00:003001Patrol, traffic enforcement, field operationsNaN200.0NaNVia De San YsidroNaN0NaNSAN YSIDRO712San Ysidro 7120030Male01NaNNohispNoneTraffic Violation54649.024603(D) VC - STOPLAMPS:VEH 2 REQUIRED (I) 54649Equipment ViolationSubject had a brakeligh that was out.Condition of parole / probation/ PRCS / mandatory supervisionNaNNaNNaNNonePerson removed from vehicle by order|Curbside detention|Search of property was conducted|Search of person was conductedNA|NA|NA|NA
40767769807698084496932449693_2CA0371100SD12021-06-3015:00:003001Patrol, traffic enforcement, field operationsNaN200.0NaNVia De San YsidroNaN0NaNSAN YSIDRO712San Ysidro 7120030Female02NaNNohispNoneTraffic Violation54649.024603(D) VC - STOPLAMPS:VEH 2 REQUIRED (I) 54649Equipment ViolationSubject had a brakelight that was out.Condition of parole / probation/ PRCS / mandatory supervisionNaNNaNNaNNoneSearch of property was conducted|Person removed from vehicle by orderNA|NA
40767869808698094496941449694_1CA0371100SD12021-06-3023:30:00501Patrol, traffic enforcement, field operationsNaN5600.0NaNECBNaN0NaNSAN DIEGO821Rolando 8210025Male01NaNNowhiteNoneTraffic Violation54168.05204(A) VC - EXPIRED TABS/FAIL DISPLAY (I) 54168Non-moving Violation, including Registration Violationexpired reg over 6 monthsNaNNaNNaNNaNNoneNoneNaN
40767969809698104497011449701_1CA0371100SD12021-06-3021:36:15501Patrol, traffic enforcement, field operationsNaN500.0NaNSaturnNaN0NaNSAN DIEGO721Egger Highlands 7210050Male01NaNNoblackNoneReasonable Suspicion53130.0415(2) PC - LOUD/UNREASONABLE NOISE (I) 53130Matched suspect descriptionRadio call of large group of people at a vehicle making loud noise.NaNNaNNaNNaNNoneCurbside detentionNaN
40768069810698114497091449709_1CA0371100SD52021-06-3023:29:461001Patrol, traffic enforcement, field operationsNaN300.0NaN17th stNaN0NaNSAN DIEGO521East Village 5210040Male01NaNNowhiteNoneTraffic Violation54427.021800(D) VC - FAIL STOP/YIELD:INOP SIGN (I) 54427Moving Violationdidnt stop at stop signNaNNaNNaNNaNNoneNoneNaN
40768169811698124497161449716_1CA0371100SD12021-06-3023:45:002001Patrol, traffic enforcement, field operationsNaN4200.0NaNdel sol ctNaN0NaNSAN DIEGO723Otay Mesa West 7230030Male01NaNNohispNoneReasonable Suspicion99999.0NA - XX AA - CODE NOT FOUND IN TABLE (X) 99999Matched suspect description415 - subj and later found to be in uncles drivewayNaNNaNNaNNaNNonePerson removed from vehicle by orderNaN
40768269812698134497261449726_1CA0371100SD112021-06-3015:54:001201Patrol, traffic enforcement, field operations15 SOUTH / AERO DRIVENaNNaNNaNNaN0NaNSAN DIEGO313Kearney Mesa 3130040Female02NaNNohispNoneTraffic Violation54566.023123(A) VC - USE CELLPH W/DRIV W/O HFD (I) 54566Moving ViolationCELL PHONENaNNaNNaNNaNNoneNoneNaN
40768369813698144499331449933_1CA0371100SD12021-06-3017:45:0012001Patrol, traffic enforcement, field operationsNaN4200.0NaNmISSION BLVDNaN0NaNSAN DIEGO122Pacific Beach 1220030Male01NaNNohispNoneReasonable Suspicion13219.0245(A)(1) PC - ADW NOT FIREARM (F) 13219Matched suspect descriptionRADIO CALL REGARDING A MALE HITTING ANOTHER MALE WITH A CROWBARIncident to arrest245PCNaNNaNNoneHandcuffed or flex cuffed|Search of person was conducted|Curbside detention|Patrol car detentionNA|NA|NA|NA